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CHAPTER  X 
INTRODUCTION 

The  design  of  a  decentralized  control  scheme  for  a  large  scale 
system  will,  in  the  formulation  of  the  problem,  generally  take  advantage 
of  some  aspect  of  the  structure  of  the  system  [16,17,18]  or  the  problem 
may  be  formulated  so  that  a  desired  structure  is  imposed  on  the  system,  e.g. 
[19,20,21,12] . 

Controller  are  generally  implemented  in  some  form  of  state  feed¬ 
back.  Since  many  aspects  of  the  system  structure  are  variant  under  the 
control  actions,  the  dependence  of  the  decentralized  control  schemes  on  the 
structure  maker  many  of  the  schemes  crucially  dependent  on  a  uniformity  of 
the  goals  of  the  individual  decision  makers. 

In  many  situations,  the  individual  decision  makers  will  have 
different  goals  and  it  may  be  infeasible  to  have  cooperation  in  agreeing  on 
a  common,  single  goal.  So,  given  that  there  may  exist  multiple  goals,  it 
is  of  interest  to  analyze  the  decentralized  control  problem  in  this  setting 

We  will  examine  issues  arising  from  the  presence  of  conflicting 
goals  among  the  decentralized  controllers.  The  rational  behavior  of  the 
controller  is  characterized  by  a  strategy  defining  the  rules  of  their 
behavior.  Of  primary  interest  to  us  will  be  the  role  that  the  Stackelberg 
strategy  can  play  in  the  decentralized  control  problem  and  a  number  of 
conceptual  issues  that  arise  in  attempting  to  make  use  of  the  strategy. 

The  Stackelberg  strategy  is  well  suited  for  use  in  designing  a 
coordination  scheme  where  there  are  many  controllers  acting  on  the  system, 
each  with  a  different  criterion  to  be  optimized.  There  are,  however,  some 


issues  regarding  Che  strategy  which  have  yet  to  be  resolved.  Among  these 
are  the  fact  that  the  principle  of  optimality  does  not  in  general  hold  and 
its  imposition  for  the  continuous  time  case  has  yet  to  be  satisfactorily 
dealt  with.  Also,  unlike  the  classic  single  criterion  linear  quadratic 
problem,  a  closed  form  solution  satisfying  the  necessary  conditions  or  a 
satisfactory  numerical  solution  technique  have  yet  to  be  developed.  In 
Chapter  2  we  develop  a  sampled  data  equilibrium  strategy  which  provides 
a  computationally  tractable  solution  technique. 

This  coordination  technique  is  prescriptive,  i.e.,  if  a  solution 
exists,  it  provides  the  methodology  for  calculating  it.  The  existence  of 
a  solution  is  not  assured.  In  an  effort  to  establish  conditions  under 
which  we  can  insure  the  existence  of  a  solution  satisfying  the  Stackelberg 
strategy.  Chapter  3  will  examine  a  very  basic  form  of  the  Stackelberg 
strategy  for  dynamic  games  and  sufficient  conditions  for  the  existence  of  a 
stabilizing  solution  will  be  developed.  We  restrict  our  attention  to  a 
formulation  dealing  with  a  linear  continuous  time  system  and  in  which  the 
control  laws  are  constrained  to  be  linear  state  feedback.  For  this  class 
of  problems  we  are  able  to  rely  on  the  concepts  of  linear  algebra  to  analyze 
the  interaction  of  the  individual  decision  maker's  controllable  and  observ¬ 
able  subspaces.  By  so  doing  we  establish  sufficient  conditions  under  which 
Che  existence  of  a  stabilizing  solution  can  be  assured. 

Another  form  of  the  Stackelberg  strategy  will  be  seen  in  the 
remaining  chapters,  entering  into  the  design  of  an  information  structure. 

The  problem  considered  is  one  in  which  there  are  many  controllers  acting 
on  a  system  where  each  controller  has  a  different  objective  and  their  controls 
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are  determined  according  to  the  Nash  equilibrium  strategy.  An  example 
demonstrating  the  Impact  of  the  information  structure  is  considered  in 
Chapter  4.  This  is  an  example  of  a  situation  in  which  the  availability 
of  more  information  to  one  of  the  controllers  has  the  effect  of  making 
that  controller  worse  off.  The  demonstrated  impact  of  the  information  struc 
ture  in  the  example  serves  as  motivation  for  the  information  structure 
design  scheme  of  the  next  chapter. 

In  Chapter  5  we  consider  the  design  of  an  improved  information 
structure  by  a  somewhat  unsuspected  use  of  the  Stabkelberg  strategy.  An 
iterative  procedure  is  developed  by  which  the  information  structure  is 
altered  to  improve  the  overall  system  performance.  The  advantages  inherent 
in  the  precedence  nature  of  decision  making  under  the  Stackelberg  strategy 
will  be  seen  in  this  formulation. 
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CHAPTER  2 

SAMPLED  DATA  EQUILIBRIUM  SIACKELBERG  COORDINATION 
2,1,  Introduction 

In  this  section,  we  consider  the  problem  of  formulating  a 
hierarchical  control  structure  for  a  multicontroller  problem  using  the 
differential  game  concept  of  an  equilibrium  Stackelberg  strategy.  It  is 
assumed  that  in  general  each  agent  has  a  different  objective  function  and 
that  one  agent,  the  coordinator  and  Stackelberg  leader,  has  an  overall 
objective  function. 

There  have  been  numerous  investigations  recently  into  the  useful¬ 
ness  and  characteristics  of  the  Stackelberg  strategy  applied  to  dynamic 
systems  [1-11].  In  particular,  the  use  of  the  Stackelberg  strategy  for  the 
coordination  of  many  agents  has  been  considered  in  [4]  and  [11]. 

A  form  of  periodic  coordination  has  been  considered  by  Chong  and 
Athans  [12]  in  which  the  vertical  communication  in  the  hierarchy  is  con¬ 
strained  to  be  periodic.  Our  basic  assumptions  are  different  from  those  of 
[12]  and  subsequently  the  nature  of  the  solutions  are  quite  dissimilar. 

With  a  Stackelberg  strategy,  we  assure  it  is  known  that  one  player, 
the  coordinator  and  Stackelberg  leader,  will  determine  his  controls  before 
any  of  the  other  players  (followers  or  lower  level  decisionmakers).  The 
lower  level  decisionmakers  then  perform  their  optimization  subject  to  their 
knowledge  of  the  coordinator's  decision,  that  is,  they  are  reacting  to  his 
decision.  The  followers  act  simultaneously  and  we  consider  the  case  when 
they  play  a  Nash  strategy  among  themselves.  The  leader  performs  his 
optimization  subject  to  the  expected  reactions  of  the  followers.  The  leader's 
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ability  to  make  decisions  first,  taking  into  account  the  reactions  of  the 
lower  level  decisionmakers,  enables  him,  to  a  degree,  to  impose  his 
criterion  onto  the  other  controllers. 

This  strategy  is  appropriate  for  imposing  a  control  structure 
on  a  problem  in  which  there  are  many  decision  makers  with  different 
criteria  unable  or  unwilling  to  cooperate  in  their  decision  making  process 
and  in  which  a  hierarchy  of  decision  making  already  exists  or  can  be 
imposed. 

The  solution  of  the  closed  loop  Stackelberg  problem  generally 
depends  on  the  length  of  the  interval  over  which  the  problem  is  defined  as 
well  as  the  state  of  the  system  at  the  initial  time  [8].  Implicit  in  the 
solution  is  a  guarantee  by  the  leader  that  he  will  not  deviate  from  his 
announced  control  rule.  If  the  problem  is  redefined  on  a  sub interval  of 
the  time  interval  of  the  original  problem,  the  solutions  on  this  interval 
would  in  general  be  different.  Thus,  the  principle  of  optimality  does  not, 
in  general,  hold. 

The  feedback  Stackelberg  strategy  is  defined  as  a  closed  loop 
Stackelberg  strategy  which  has  the  added  constraint  that  the  leader's  control 
is  required  to  satisfy  the  principle  of  optimality  [8].  Generalization  to 
equilibrium  Stackelberg  strategies  is  introduced  in  [7].  Further  discussion 
of  the  Nash  and  Stackelberg  strategies  for  dynamic  games  can  be  found  in 
the  references. 

The  open  loop,  closed  loop,  feedback  and  sampled  data  Stackelberg 
strategies  exhibit  notably  different  characteristics  due  to  the  fact  that 
they  are  based  on  fundamentally  different  problem  formulations.  In  order 
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to  see  the  motivation  and  significance  of  the  sampled  data  formulation  it 
is  necessary  to  appreciate  two  particular  aspects  of  the  continuous  time 
Stackelberg  problem. 

First,  as  described  above,  the  solution  of  the  closed  loop 
Stackelberg  problem  for  dynamic  games  does  not,  in  general,  satisfy  the 
principle  of  optimality.  The  imposition  of  the  principle  of  optimality  for 
discrete  time  games  has  been  considered  in  [8]  while  the  procedure  for  doing 
this  for  continuous  time  games  has  yet  to  be  resolved. 

A  second  peculiarity  of  the  closed  loop  Stackelberg  problem  is 
that,  unlike  the  classic  single  agent,  linear  quadratic  control  problem,  or 
even  certain  multicontroller  problems,  the  necessary  conditions  derived  by 
the  variational  technique  for  the  linear  quadratic,  continuous  time,  closed 
loop  Stackelberg  problem  result  in  a  nonlinear  control,  the  existence  of 
which  is  not  assured  [6],  [16J. 

With  these  aspects  of  the  continuous  time  Stackelberg  problem  in 
mind,  the  significance  of  the  sampled  data  formulation  is  apparent.  That 
is,  the  resultant  control  laws  are  piecewise  continuous  linear  time  varying 
functions  of  the  measurements  for  the  linear  quadratic  case  and,  as  we  have 
formulated  it,  the  principle  of  optimality  holds  at  the  sampling  times. 

Recent  work  on  the  Stackelberg  strategy  for  continuous  time  dynamic 
systems  has  concentrated  primarily  on  the  open  loop  formulation  [4]  and  on 
the  linearly  constrained  closed  loop  formulation  [6].  For  the  linear 
quadratic  case,  the  open  loop  solution  is  a  linear  function  of  the  initial 
condition  and  the  solution  in  [6]  is  linear  by  construction  but  the 
principle  of  optimality  does  not,  in  general,  hold.  The  linear  form  of  the 


sampled  daca  solution  is  a  direct  result  of  this  information  constraint  and 
is  not  due  to  any  structural  (linear)  constraint  being  imposed  on  the  form 
of  the  solution. 


By  considering  the  sampled  data  formulation  we  have  been  able  to 
obtain  a  responsive  state  feedback  solution,  which  is  tractable,  has  a 
very  simple  form  for  implementation,  and  for  which  the  principle  of  optima¬ 
lity  holds  at  the  sampling  times.  Of  equal  importance  is  the  existence  of 
an  efficient  algorithm  for  the  calculation  of  this  solution.  In  this  chapter 
we  derive  a  computationally  efficient  technique  for  obtaining  the  solution 
for  the  linear  quadratic  sampled  data  equilibrium  Stackelberg  strategy. 

The  solution  algorithm  tends  (i)  to  minimize  the  on-line  computations  and 
(ii)  to  take  advantage  of  the  nature  of  the  sampled  data  solution  to  greatly 
reduce  the  horizon  over  which  integrations  must  be  performed,  thereby 
reducing  off-line  computations  as  well.  These  features  are  obtained  as  a 
result  of  employing  a  form  of  invariant  Imbedding  [13]. 

In  Section  2.2  we  formulate  the  problem  and  present  necessary 
conditions  for  the  solution.  The  linear  quadratic  case  will  be  considered 
in  Section  2.3  and  techniques  for  the  solution  of  the  linear  quadratic  case 
will  be  discussed  in  Section  2.4.  Section  2.5  summarizes  the  results. 

2.2.  Sampled  Data  Equilibrium  Stackelberg  Formulation 
Consider  the  system 

*C!-(x,ul;i»0,l,...,m),  x(to)-xo,  (2.1) 

ri  n 

u^tR  ,  x€Rn,  where  r^  is  the  dimension  of  the  ith  control  vector.  Each 
lower  level  control,  u. ,  for  i  m,  is  chosen  to  reduce  as  much  as 
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possible  the  scalar  Index 

Cf 

Ji  *Kif(x(tf>> Lt(x»Uj5j -0,i,...,m)dt.  (2.2) 

Co 

The  coordinator's  control,  uq,  Is  chosen  to  reduce  as  much  as  possible 
the  scalar  Index 


JQ  -Kof(x(tf))  +J  Lo(x,u1;i-O,l,...,m)dt. 
fco 


(2.3) 


The  terminal  time,  t^,  Is  fixed. 

The  Information  Is  assumed  to  be  In  the  form  of  sampled  data 
acquisition,  that  Is,  measurements  are  taken  at  r  discrete  Instances  In 
time  U1€[to,tf),i-0,l,...,r-l}.  The  controls  will  be  functions  of  time 
and  the  latest  state  measurement,  i.e.,  u^«ui(t,Xj)  for  tj  £  t  <  t^+^,  for 
all  1,  where  Xj^x(tj). 

The  leader  will  calculate  and  announce  uQ(t,Xj)  for  tf:[tj,tj+i; 
and  j  «0,l,...,r-l  at  the  beginning  of  the  game.  This  control  is  chosen 
to  minimize  the  leader's  performance  index  under  the  assumption  that  the 
followers  will  in  turn  be  minimizing  their  respective  performance  indices 
subject  to  the  announced  leader's  control,  and  subject  to  the  requirement 
that  the  leader's  control  remains  optimal  for  any  game  starting  at  tj, 
j  ■  0, 1, . . . ,r-l.  The  controls  are  calculated  based  on  the  assumption  that 
future  measurements  will  be  available  at  t^,  k  •  j+1, . . . ,r-l. 

In  contrast  to  the  single  controller  case  or  even  certain  multi¬ 
controller  strategies,  the  Stackelberg  controls  including  sampled  data 
closed  loop  control  do  not  in  general  satisfy  the  principle  of  optimality 
[8].  In  this  section  we  derive  necessary  conditions  for  sampled  data 
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equilibrium  Stackelberg  strategies  whereby  the  principle  of  optimality  is 
imposed  at  the  sampling  times  t^,  j  » 0,1, . . ,,r-l. 

Let  the  optimum  costs  to  go  at  time  t^  be  denoted  by  V^(x(tj) ,tj) , 


1*0,1,...  ,m.  Imposing  the  principle  of  optimality  we  have 

*  *  *3+1 

V^(Xj  , t j )  * min{v^(Xj^, t  +  J  ^ (x,u^jk  ,m)dt} 

ut  tj 


(2.4) 


where 


Vi(x(tf),tf)  -Kif (x(tf)),  i«0,l,...,i 


(2.5) 


and  where  the  minimization  with  respect  to  u^  in  (2.4)  is  subject  to  the 
system  constraint  and  to  the  minimization  being  performed  by  the  other 
controllers  according  to  the  strategy  outlined  in  the  preceding  paragraphs. 
Note  that  the  optimizations  of  the  future  periods  are  imbedded  in  the  term 
Vi(Xj+1,tj+1) .  Also  notice  that  at  sample  time  t^,  all  controls  from  t^ 
through  tf  will,  in  principle,  be  calculated  and  that  they  are  Independent 
of  any  control  action  prior  to  tj  except  for  the  effect  of  x ^ .  So,  by  the 
nature  of  the  problem  formulation,  the  solution  will  satisfy  the  principle 
of  optimality  at  the  sampling  times.  This  aspect  of  the  sampled  data 
formulation  is  analogous  to  the  feedback  formulations  of  [8] ,  or  the 
equilibrium  formulation  [7]. 

The  variational  method  is  applied  to  (2.1),  (2.4),  and  (2.5),  to 
obtain  the  necessary  conditions.  These  conditions  are  an  extension  of  those 
derived  in  [11].  The  necessary  conditions  for  the  followers  on  [tj,tj+^) 
for  i » 1, . . . ,m  are 


x-f(x,u1;  i  ■  0,1, . . .  ,m)  ,  x(tj)*Xj 


(2.6) 


(2.7: 


where 


*i 


dV1(x(C)+I) (Ci+1) 

a*(tJ+l) 


0  - 


(2.8; 


H1(x,Pi,uk;k-0,l,...,m)  -  Lt(x,uk;k  -  0, 1, . . .  ,m)  +  p£f  (x,uk;k  -  0, 1, , 


(2.9; 


The  necessary  conditions  for  the  leader  on  [tj,tj+1)  are 


X.--Z  xu  )•  ° . “ra-  ‘iV,(tWl’1W> 


dx 


J+l' 


k-l’k^j+l' 


dx(tj+1)‘  (2. 1C 


SH  * 


Y. 


i*‘  ^  »  Yi(tj)  *0.  i-l,...,m 


where  Y  (t . )  *  lim  Y . (t)  for  Y  defined  on  the  (j-l)st  interval  [t.  ,,t. 
i  j  t-tj  i  i  J  1  J-l*  j 

YjL(tj)  “Y1(tJ)  defined  on  the  jth  interval  [tj,tj+l). 


(2.11 

) 


dH 


du 


■  ■  0 


(2.12 


dH 


i-1 . m 


(2.13 


where 


Ho(x,\,p^,Y^,8^;  i  ■  1,2 , . . .  ,m,Uj  j  j  ■0,1,...  ,m)  ■  Lq(x,u^;  i  ■  0, 1, . . .  ,m) 

Sr  ^le 

+  x*f(x,ui;  i-o,i . “)+k^Yk<-  dr),  +  0k^>'^  <2-14: 

Equation  (2.13)  and  the  constraints  appended  under  the  sunmation 
sign  in  (2.14)  are  due  to  the  leader  taking  into  account  the  reactions  of 
the  lower  level  decisionmakers.  The  solution  conditions  on  the  8,  are 
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implicit  in  equation  (2.13). 


2.3.  The  Linear  Quadratic  Case 


Assume  the  system  is  linear 


*  *  ** +  AVt 


x(t  )  -  x 
o'  0 


(2.15) 


(2.16) 


and  the  criteria  quadratic 


1  I  1  r*f  m 

Ji*2X’Kifxl  <x,Qix  +  j-OujRi,juj)dt-  (2*17> 

t*t  -  t 


The  necessary  conditions  for  the  lower  level  controllers  for 


tfc[tj,tj+1)  and  i  ■!,... ,m  are 


p..-qiJt-A'Pl,  p;(tJ+1) 

The  necessary  conditions  for  the  leader  are 


dVj(x(t1+i),t.+i) 

dx(tj+l> 


(2.18) 


(2.19) 


-Q  x  -  A'X+  2.Q 
o  i»i  i  i 


(  j+l>  a«<«j+i>  1-11  J+l  <*<tj+1>2 


(2.20) 


+  W"0 


u  ■  -R  *  B'\ 
o  o,o  o 


(2.21) 


(2.22) 


where 
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VM»i 


During  each  interval,  the  state  will  evolve  according  to 

a 

*  "  Ax  -  -  SQX  (2.23) 

for  t€[tj,tj+^)  where  x(tj)  is  determined  In  the  previous  interval. 

If  the  state  measurements  are  made  at  r  discrete  instances  in 
time,  we  are  faced  with  an  (rfl)-point  boundary  value  problem.  At  this 
stage,  there  are  two  alternate  approaches  we  can  take  to  the  problem.  The 
first  and  standard  approach  starts  by  assuming  an  explicit  functional 
dependence  of  the  costates  on  the  state.  This  results  in  a  set  of  coupled 
matrix  Riccati  equations  which  must  be  solved  repeatedly  at  each  sample 
time.  A  general  algorithm  for  the  efficient  solution  of  these  equations  for 
each  new  set  of  boundary  conditions  will  be  outlined  in  the  next  section. 

We  will  also  consider  an  even  more  efficient  approach  utilizing  invariant 
imbedding  [13,14],  It  is  based  on  an  assumption  of  the  functional  dependence 
of  the  state  and  costates  on  one  another  and  of  their  explicit  dependence 
on  their  respective  boundary  conditions.  This  result  will  be  shown  in 
detail. 


2.4.  Solution  of  the  Linear  Quadratic  Problem 

The  first  approach  to  dealing  with  the  r+1  point  boundary  value 
problem  starts  by  assuming  that  the  costates  depend  on  the  states  by  affine 
functions.  The  affine  dependence,  rather  than  simply  linear,  is  necessary 
so  that  the  lower  level  decisionmakers  will  be  able  to  calculate  their 
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controls  as  functions  of  the  leader's  announced  control,  i.e.,  their  compu¬ 
tations  will  be  coupled  to  the  leader's  sequentially,  not  simultaneously. 

Differential  equations  can  be  found  for  the  coefficients  of  these 
functions  and  for  the  associated  costs  to  go.  If  m  is  the  number  of  con¬ 
trollers,  the  problem  can  be  reduced  to  that  of  solving  m  coupled  matrix 
Riccati  equations  and  m  matrix  Lyapunov  equations  at  each  sample  time,  all 
with  boundary  conditions  at  a  common  time.  The  same  set  of  equations  are 
resolved  at  each  sample  time  with  only  a  change  in  the  boundary  conditions. 

A  sampled  data  Nash  formulation  has  been  considered  by  Simaan  and  Cruz  [9] 
and  a  computational  technique  for  the  solution  of  the  resultant  Riccati 
equations  has  also  been  obtained  [10].  Ue  have  obtained  a  generalization 
of  [10]  in  which  the  solutions  of  the  Riccati  equations  are  expressed  in 
terms  of  a  preliminary  solution  due  to  a  specific  set  of  boundary  conditions 
and  a  correction  term  dependent  on  the  actual  boundary  conditions.  An 
algorithm  is  found  for  finding  these  correction  terms  requiring  the  solution 
of  m  uncoupled  matrix  Riccati  equations,  thus  providing  substantial  improve¬ 
ment  over  a  brute  force  solution  of  the  coupled  equations. 

With  the  first  technique,  we  assume  that  the  cost  to  go  functions 
are  of  the  form 

V11(t)-(|x'Elix+eu'x  +  qli)|t 
V2(t)  -  (^x'E2x+e£x+q2)lt 

in  which  case,  the  boundary  conditions  in  (18)  and  (20)  are 

"  (Eitx  +  en  t 
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X(Cj+l^  “  (E2X+  e2  '  i-1  EliYi)  ^  t 


j+1 


As  is  conventional ly  done  in  solving  the  two  point  boundary 
value  problems  in  optimal  control,  we  assume  a  functional  dependence  of  the 
costates  on  the  state.  An  affine  dependence  is  assumed  due  to  the  nature 
of  the  Stackelberg  problem.  Thus  we  assume 


pi‘Klix  +  8li 


Yi"K3ix  +  g3i 

X  -  KjX  +  g2 . 


(2.24) 


By  differentiation  of  these  equations  and  from  the  equations  of 
section  3,  we  find  that  the  K  matrices  and  g  vectors  must  satisfy 


m 


Kli*  'Qli  "A'Kli  '  KliA  +  KliIj-lSljKlj  +S2K2^,Kli(tj+l)  *Eli(tj+l) 

8li  “  ■A'8li  +  Klitj-lSljglj  +S2g2^8li(tj+1)  “pli(tj+l) 

K2  «-Q2  -A'K2  '^A  +  i^liSi*1^  i-1  SliKli  +  K2S2K2 

K2(tj+1*  '  i-lPli^tj+l^E3i^tj+l^ 

m  m 

g2  *"a,82+K2  i-lSli8li  +  K2S282  +  i-lQli83i 

m 

*2^+1*  “^^j+l*  "  i-lPli(tj+l)g3i(tj+l) 

K3l  -  AK3t  -  K3iA  -  S2  ^  UKU  +  Sj^  +  K3t  ^  SyKy  +  K3iS2K2 

m 

g3i * Ag3i  "  S2 , li + Slig2  +  K3i^j-lSlj8lj  +  S2g2^ 

K3i(tj)  “°»  fc3i(t+j)“° 

The  coefficients  in  the  cost  to  go  functions  must,  in  each 


(2.25) 


interval,  satisfy  the  following  equations 


eu--eua-a'eu-n;1 

in 

eli  *  '^'eli  +  2Elif  j-lSlj8lj  +s282^“  2  Nli 

^li  "  ^j-lgljSlj  +82S2)pli  "2  Nli 
Eu(tf)-Kif,  eli(tf)-0,  qu(tf)-0 

Eli(tj+1)  “Eli(tJ+l**  eli(tj+l)  “eli(tj+l)’ 
qli(tj+l)  “qli(tj+l) 


(2.26) 


and  che  A,  N^,  j  >1,2,3  are  known  in  terms  o£  the  previous  solution  of 

(2.25) .  Equations  of  the  same  form  are  also  satisfied  by  the  coefficients  E2» 
e,;,  and  q^.  The  assumed  dependence  of  the  costates  on  the  state,  equation 

(2.25) ,  results  in  a  set  of  equations  which,  unlike  the  conventional  optimal 
control  problem,  are  themselves  a  two  point  boundary  value  problem.  This 

is  a  result  of  the  leader  appending  the  two  point  boundary  value  problem 
which  results  from  the  followers'  optimizations.  So,  we  assume  an  explicit 
dependence  of  the  solution  of  the  and  the  on  the  solution  of  the 
Kli’  8li’  K2  and  82  ^  or<*er  to  reduce  these  equations  to  a  solvable  single 
point  boundary  value  problem.  Thus  we  assume 

K3i  *F3i,2K2  + j-lF3i,ljKlj+  F3i,4 
m 

g3i  “  F3i,282  +  j«lF3i, lj8lj 

Notice  that  the  same  coefficient  matrices  appear  in  both  equations.  This, 
it  turns  out,  is  sufficient  to  obtain  the  desired  dependence.  The  differential 
equations  that  these  coefficient  matrices  must  satisfy  are 
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F3i,2  “AF3i,2  +  F3i,2A'  +Sli  +  P3i,4S2  'F3i,2  j»l  QljF3j,2  ,F3i,2  (tj 5  *° 


F3i,lk"AF3i,lk  +  F3i,lkA'  +  F3i,4Slk  ’  F3i,2  j-1  <JljF3i,  lk  1  *  k 


F3i,lk*AF3i,lk  +  F3i,lkA'  +  F3i,4Slk  '  F3i,2  F3i,lk  ’  S2,lk*  1  *k 

F3i,lk(tj>  -  ° 


F3i,4  *AF3i,4  +  F3i,4A'  +F3i,2Q2  "  F3i,2  j-1  ^3j  ,4 +F3i,  ljQlj)F3i,4(tj  >  “  0 


These  equations  are  solved  once  only,  and  for  a  period  of  one  sample 
interval.  The  solution  is  then  used  repeatedly  during  each  sample  interval, 
plugging  into  equations  (2.25),  converting  (2.25)  to  a  single  point  boundary 
value  problem  in  which  only  the  boundary  condition  changes  between  sample 
intervals. 


Having  done  this,  we  are  now  at  a  point  where  the  solution  is 
expressed  entirely  in  terms  of  equations  of  the  following  general  form  of 
the  coupled  matrix  Riccati  equations 


i*m  A*  J A2  4.  Z  tA  Yl  +  Vl  Z  If *4.0* 

K j  A1 ,  jKj  +  K jA2 ,  j  +  k*lTj  ,  kKk  +  Kj  k- 1  D j  ,  k'Sc  +  Q  j 


Kj(Ti+1);  known 


(R) 


where  l  is  the  number  of  coupled  equations.  These  must  be  solved  repeatedly 
with  changes  occuring  only  in  the  boundary  conditions. 

The  final  step  for  this  technique  of  solving  the  sampled  data 
problem  is  to  derive  an  efficient  technique  for  the  repeated  solution  of 
coupled  equations  of  the  form  (R) .  What  follows  is  a  generalization  of  [10] 
and  [15]. 
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"Wnin/i 


jjj 


The  approach  taken  Is  to  express  the  solutions  Kj ,  corresponding 

Aj 0 

to  the  actual  boundary  conditions,  in  terms  of  a  solution  K.  which 
corresponds  to  some  other  arbitrary,  known  boundary  conditions.  Equations 
for  the  correction  terms  are  found  and  a  sequence  of  steps  leads  to  a 
solution  which  requires  l  uncoupled  matrix  Riccati  equations  to  be  solved 
each  time  and  some  auxilary  equations  must  be  solved  once  only  and  over  a 
period  equal  to  the  sample  interval.  The  details  follow. 

Define 


Z]  =  [K*  -  KjV1 


tcf  -  K?  +  (zV1 

j  J  J 


Differentiating,  we  find 


i\  -  -Z^.  AjZj  +  Z  j  JL  G  J>k(zf)-lz} 


j  s  1,2, 


where 


AJ  ■  (’A2,j  •k-lDj,kV 


G*  .  -  (-K.  D.  .  -  T*  ) 

j.k  j  J.k  j,ky 

A  _d^ 

Hj,kft  Dj,k 


If  we  define  the  term 


K?‘l  £  (zf)_1zf  ,  j  *  1,2 , 
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differentiation  yields 


Kj  (Ck+lJ  “  fZj(Ck+lj]  2k(Ck+l)  J  “  1 . l~1' 


-l„i, 


where 


A^1  -  kl 
Al,j  AU 


<-Af  +  Gli> 


a£"l 
2,j 

I*’1  «  -Ql  , 
j,k  j,k 

Di_1  =. 

Dj,k  ^j.k 


>r 


"j,* 


l  1- 1 

So,  the  solution  of  the  Zj  equations  can  be  obtained  once  the  Kj  equations 

l 

are  solved.  However,  the  solutions  for  the  original  equations,  Kj  are 

-i  "1  £  “1 

expressed  in  terms  the  (Z.)  j»l,...,£  and,  given  (Z^)  ,  the  remaining 

J  * 

(zf)”^  for  j*l,...,£-l  are  known  in  terms  of  (Z^)  ^  and  the  \ 
j»l,...,i-l.  For  each  i,  i*»l,  ...,£,  the  term  (Z^)  satisfies 


Thus 


,  we  have  the  Kj,  j»l,...,£  known  in  terms  of  (Z^)  ^  and  the  Kj  \ 
.  .  _  .  .  .  ..£-1  ,  . , 


£-1 

J«l, . . . ,i-l.  The  equations  in  Kj  are  of  the  same  form  as  the  equations 
a 

in  KJ  and  so  the  technique  is  applied  again  and  is  done  recursively  until 
we  reach  K^>  Notice  that  at  level  i  of  the  recursion  only  one  equation,  in 


19 


(z[)  \  need  be  integrated. 


«  1 

In  summary,  once  the  preliminary  solutions,  K^,  are  found,  the 


desired  solutions  are  obtained  as  follows; 

solve  for  (Z^)"1 

kJ-kJ  +  (zj)"1 


solve  for  (Z^)"1 

(Z^)’1-kJ(Z^)“1 

h  *^2  +  (Z2)-L 

Kl‘^l+(Z1)"1 

~  3-1 

solve  for  (Zj) 

<zJ)‘l-K*(Z*)“l 

<Z2)"l*K2(Z3)’1 
3  "3  3-1 

K3  =^3  +  (Zj) 

K^-K2  +  <Z^)_1 

k^-kJ+czJ)"1 


solve  for  (Z^)  * 

(Zj)”^  =Kj”1(z^)”1  j-1, . . . ,  £-1 

Kj'Kj  +  (zj>  1 

So,  in  this  first  approach,  the  sampled  data  Stackelberg  problem 
is  reduced  to  the  solution  of  a  set  of  ji  coupled  matrix  Riccati  equations, 
each  of  dimension  n,  which  must  be  solved  at  each  sample  time  with  only  a 
change  in  the  boundary  conditions.  The  solution  of  these  equations  for 


any  boundary  condition  is  then  found  to  be  expressible  in  terms  of  the 
solution  of  an  auxiliary  problem.  The  needed  correction  terms  require  the 
solution  of  1  uncoupled  matrix  Rlccati  equations  of  dimension  n,  at  each 
sample  time.  Considerable  savings  in  computation  will  accrue  if  there  are 
a  large  number  of  samples,  which  is  typically  the  case. 

2.4.1.  The  Second  Approach;  Invariant  Imbedding 

The  ultimate  goal  when  deriving  the  solution  technique  is  to 
minimize  the  amount  of  computations  required  by  taking  advantage  of  the 
fact  that  the  equations  to  be  solved  are  the  same  in  each  sample  interval 
and  only  the  boundary  conditions  change. 

The  derivations  performed  in  the  remainder  of  this  section  will 
proceed  as  outlined  below.  First  we  define  more  compact  notation,  grouping 
the  state  and  costates  according  to  their  boundary  conditions.  We  then 
assume  an  explicit  functional  dependence  of  the  costates  on  the  state  and 
on  the  costates 1  boundary  conditions.  Due  to  this  assumption,  the  solutions 
of  the  resultant  equations  are  independent  of  the  changing  costates ' 
boundary  conditions  and  it  is  because  of  this  independence  that  we  are  able 
to  obtain  the  computational  savings.  The  cost  to  go  equations  are  derived 
since  they  are  needed  to  generate  the  appropriate  boundary  conditions  to 
plug  into  the  solution  functions.  A  functional  dependence  of  the  costs  to 
go  on  their  boundary  conditions  is  also  assumed  and  finally  the  boundary 

conditions  for  each  interval  are  established  in  terms  of  those  in  the 

/ 

adjacent  interval.  The  details  of  the  derivation  follow. 

Rather  than  making  the  standard  assumption  of  a  functional 
dependence  of  the  costates  on  the  state  alone  as  in  the  first  approach,  we 
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will  make  a  different  assumption.  Notice  that  on  the  interval  [tj,tj+1), 
the  costates  p^,  Vi,  equations  (2.18)  and  equation  (2.20),  have 
boundary  conditions  at  tj+^.  The  costates  Y^,  Vi,  equations  (2.21)  and  the 
state  x,  equation  (2.23),  have  boundary  conditions  at  t^.  For  convenience 
of  notation,  let  us  group  the  state  and  costate  vectors  according  to 
boundary  conditions  as  follows 


y,  »  x 


v  £  ({ 1  *  Y »  *  •  Y  1  ^  * 

y2  *  (  i  :  2  :  •••  :  m' 

y3  •  (X  :  pi  :  p2  :  * • *  :  pm)  • 


Now  equations  (2.18),  (2.20),  (2.21)  and  (2.23)  can  be  expressed  as 


d_ 

dt 


yl 

1 

AU 

0 

A13 

y2 

0 

A22 

A23 

y3 

A31 

A32 

A33 

y2 


(2.27) 


where  the  A^  of  (2.27)  are  appropriate  concatenation  of  the  Q,  A  and  S 

matrices  of  (2.18),  (2.20),  (2.21)  and  (2.23).  In  each  interval 

the  vectors  y^  and  y ^  have  boundary  conditions  at  tj  and  the  vector  y^  has 


boundary  conditions  at  t 


j+1* 

y2(tj)-o 


(2.28) 


y3(tJ+l) 


—  — 

\ 

P1 

• 

• 

* 

Pm 

i 

< 
o  - 

m  d2V 

dyl 

iil  ,2  'l 

*1 

dV' 

1 

3yl 

• 

• 

t 

(2.29) 
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where  y2(t*)  -y2(tj)  defined  on  the  interval  [tjftj+1)  and  y3(tj+1)  « 

lim  y3(t).  for  y3(t)  defined  on  the  interval 
c  cj+l 

It  is  in  the  next  step  where  we  deviate  from  the  standard 
approach.  We  will  make  assumptions  of  the  functional  dependence  of  the 
costates  on  the  state  and  on  the  costates '  boundary  conditions.  In  so  doing 
we  will  be  able  to  solve  for  these  functions  independent  of  the  costates1 
boundary  conditions. 

For  assume1 

y2(t)  -F1(t)y1(t)  +F2(t)y2(tj)  +  F3(t)y3(tj+1)  (2.30) 

and 


y3(t)  -G^Oy^t)  +G2(t)y2(t)  +  G3(t)y3(tj+1) . 

By  differentiation  of  (2.30)  and  (2.31)  and  by  substitution  of  (2.27) 
find2 


G1  "  A31  +  A33G1  '  G1A11  ‘  GlA13Gi  '  G2A23Gr  Gl(tj+1)  *  ° 
G2  “A32  +A33G2  ’  G1A13G2  "  G2A22  "  G2A23G2’  G2(tj+1^  *  0 
G3  “  (A33  *  G1A13  "  G2A23)G3’  G3(tJ+l*  *  1 
Fl“  (A22+A23G2)F1  ’  F1(A11  +  A13G1)  "  F1A13G2F1  + 


F2  “A22F2  +A23G2F2  ”F1A13G2F2'  F2*V  -I 

F3  *  ^A22+A23G2  "  F1A13G2)F3  +  A23G3  "  F1A13G3’  F3*V  *  °* 


(2.31) 
,  we 


(2.32) 

(2.33) 

(2.34) 

(2.35) 

(2.36) 

(2.37) 


^The  dependence  of  y3(t)  on  y2(t)  Instead  of  y2(t  )  results  in  simplified 
computations . 

2 

All  matrices  are  evaluated  at  time  t  unless  indicated  otherwise. 


23 

Since  y2(tj)«0  and  by  substituting  (2.30)  into  (2.31)  we  have 


y2(c>  *F1(t)y1(t)  +F3(t)y3(tj+1)  (2.38) 

y3(t)  -G1(t)y1(t)  +G3(t)y3(tj+l)  (2.39) 

where  G^-G^  +  G^  and  G3«G3  +  G2F3. 

For  t^[tj,tj+1)  assume1 

yt(t)  -H1(t)yL(tj)  +H3(t)y3(tj+1)  (2.40) 

by  differentiation  of  (2.40)  and  substitution  of  (2.27)  and  (2.39)  we  find 

H1-  (Au^+AisG^Hj^,  H1(tj)-I  (2.41)  * 

H3-(Aa+A13G1)H3+A13G3,  H3(tj)-0.  (2.42) 


If  the  system  (2.15)  and  the  criteria  functions  (2.17)  are  time  invariant  and 
if  the  sampling  rate  is  constant,  that  is  if  (tj+1  -  t^  )*  T  »  constant  for  ail 
j,  the  equations  (2.32)  through  (2.37),  (2.41)  and  (2.42)  will  be  the  same 
for  each  Interval.  Then,  since  their  boundary  conditions  are  invariant, 
these  equations  will  have  to  be  solved  only  once  and  the  same  solution  will 
be  valid  for  every  interval  j  ■  0, l, . . . , r-1. 

2.4,2.  Boundary  Conditions  and  Cost  To  Go  Equations 

The  boundary  conditions  for  the  costate  equations  on  the  jth 
interval  [tj,t^+^)  are  known  in  terms  of  the  costs  to  go  at  the  end  of  the 
interval,  (2.7)  and  (2.10).  Therefore,  for  the  purpose  of  obtaining  the 
costates'  boundary  conditions,  we  must  first  derive  the  cost  to  go  equations. 
First,  substituting  (2.19)  and  (2.22)  for  the  controls  and  with  the  form  of 

because  (2.30)  and  (2.31)  reduce  to  (2.38)  and  (2.39),  the  dependence  of 
y^(t)  on  y2(tj)  need  not  be  assumed. 
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the  solution  for  y 3  as  in  (2.39),  recalling  that  -  (X '  i  p|  * 
the  integrands  of  the  criterion  functions  can  be  written 


Lt  *  2(*  V  +  jSou' jRljuJ  *2tyiQlyl  +  y3 V3J 


and  for  t  6  (t^.t^), 

Li*I^yi^ilyl  +  y3(tj+l),^i2y3(tj+l^  +yl?i3y3(tj+l)  (2.43) 

where  all  variables  are  evaluated  at  time  t  unless  indicated  otherwise,  and 
where 


and 


Sil-Qi  +  GiSiGl 
®i2-®3*i*3 


Si3  *  G[SiG3 


S  A 
bi  - 


3Lo 


Sil 


im 


Due  to  the  assumed  explicit  dependence  of  the  costates,  y^(t)  on 
their  boundary  conditions  in  each  interval,  we  must  make  a  similar  assumption 
for  the  form  of  the  cost  to  go  equations  so  that  they  will  also  be  independent 
of  the  changing  boundary  conditions.  That  is,  for  the  interval  t£[tj»tj+^) 
we  define  the  function 

V1(y1(t),t)  -  |(y1(t)'C11(t)y1(t) +y3(t*+1),Ci2(t)y3(tj+1)j 


y (£) 1  (t) y3 (t • 


(2.44) 


jt'.u’iu  him-  .uk 
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When  evaluated  at  t^,  with  the  controls  in  the  interval  [t^,t^>  being  the 
optimal  controls  defined  according  to  (2.4),  this  function  is  then  the 
optimum  cost  to  go,  denoted  V*(y^(t^),tj).  By  (2.44)  we  see  that  on  the 
interval  [t^.t^^),  the  cost  to  go  is  not  only  quadratic  in  y^,  but  also 
has  a  quadratic  term  in  y3(tj+j)  anc*  a  cross  term  in  y^(t)  and 

From  the  relationship  between  the  costs  to  go  (2.44)  and  the 
integrands  of  the  criteria  functions  (2.43),  the  differential  equations  of 
the  coefficient  matrices  in  (2.44)  are  found  to  be 


Cil  "  _Sil  “  CilAll  ‘  AilCil 
Ci2  "  “®i2  “  2A13Ci3 
Ci3  "  ”^i3  "  CilA13  '  AUCi3 
where  Au  -  C^+A^)  and  Au -A^^. 


(2.45) 

(2.46) 
(2.47; 


2.4.3.  Boundary  Conditions 

The  boundary  conditions  for  the  last  interval,  that  is,  at  the 
terminal  time,  t^,  are 


cu<‘f>  ’  Kif 

c12<‘f>  ■  0 


c13(t£)  -  0. 


(2.48] 


We  must  also  establish  appropriate  boundary  conditions  for  the  remaining 
intervals.  The  costs  to  go  must  be  continuous  and  therefore 

Vi(yl(tj),tj)  "  Vi(yl<Cj)»t^>*  (2.49) 


Since  the  cost  to  go  equations  are  integrated  backwards,  we  are  trying  to 
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establish  the  C|k(t")  in  terms  of  the  Ctk(tp  at  each  j,  for  each  i,  and 
for  all  k,  k  » 1,2,3. 

Let  us  choose 


Ci2<V“° 

ct3(t")  -0 


(2.50) 

(2.51) 


for  all  j  and  for  all  i.  So  now  we  must  simply  find  ci^(tj)  terms  of  the 
Cik(tj)  for  k*l,2,  and  3. 

Due  to  their  interrelatedness,  we  must  simultaneously  consider 
solving  for  the  boundary  conditions  y^(t^)  from  (2.18),  (2.20)  and  (2.44) 
and  solving  for  the  C^1(t^)  in  terms  of  the  Cik(t^),  k* 1,2,3,  from  (2.49). 

To  minimize  the  required  computations,  it  is  advantageous  if 
y^(t)  is  broken  up 


y3  “ 


(2.52) 


The  derivation  of  the  boundary  conditions  for  the  jth  interval 
[tj,tj+1)  proceeds  as  follows.  From  (2.29),  (2.44),  (2.50)  and  (2.51) 


'11 


'21 


'ml 


yl(tj+l) 


(2.53) 


'j+1 


and 


;.«r  i-  ; 


y3(tJ'+1)-[c0lyl-C<Fl!'l+F3y3)I 


‘°01i’l-J(FlyL  +  F3F3  +  F3y3>I 


(2.54) 


1  * 

where  C  -  (C^  .*  I  ...  I  Cj^]  and  where  F3  is  broken  up  into  F3  *  [F3  .’  F^ 

12  12 
with  F3  and  F3  having  dimensions  which  correspond  to  y3  and  y3»  By  substi¬ 
tuting  (2.53)  into  (2.54),  equation  (2.54)  becomes 


y3(tj+l>  *  *C01yl  '^(Flyl  +  F3y3  +  F3^'yl)^ 


y3(tj+l}  ’f(I+^F3)"1(C0l‘C(Fl  +  F3C'))yl1*  - 

Cj+1 


(2.55) 


Combining  (2.53)  and  (2.55)  defines  D. 


y3(tJ+l>  *  VlW* 


(2.56) 


where 


(i  +  cf3)_1(c01  -c(f1+f3c')) 


.  (2.57) 


By  breaking  up  y3  as  in  (2.52)  we  need  only  invert  a  matrix  of  dimension  n, 
the  system  dimension,  to  obtain  D^+^.  Otherwise  we  would  have  had  to  invert 
a  matrix  of  dimension  n(nrt-l). 

To  find  the  C^(tj+^)  we  also  need  a  relationship  between 
and  y^(tj).  That  is,  from  (2.40)  and  (2.56)  we  can  find 


y,(t«.i)  ■  E,y,(t,) 


(2.58) 
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where 

EJ  •  Vl(1  •H3(tj+l)‘lHl(tj+l>-  <2'59) 

So,  from  (2.44),  (2.49),  (2.50),  (2.51)  and  (2.58) 

Cll<tj)  "Cii(tj>+EjCi2(tj)Ej+2Ci3<,:j)Ej'  (2*60) 

We  now  have  all  of  the  required  boundary  conditions.  The  cost  to  go  boundary 
conditions  are  (2.48),  (2.50),  (2.51)  and  (2.60)  and  the  costate  boundary 
conditions  are  (2.56)  or  (2.58). 


2.4.4.  Solution  of  the  Cost  to  Go  Equations 

In  each  interval,  we  do  not  need  the  cost  to  go  for  all  t 
but  rather  we  only  need  the  value  at  the  initial  boundary,  i.e.,  we  only 
need  to  solve  for  the  Cik(tj)  in  terms  of  the  (^(t^). 

The  cost  to  go  equations,  (2.45)  through  (2.47),  are  the  same  for 
each  interval  and  only  the  boundary  conditions  change.  In  order  to  avoid 
resolving  these  equations  in  each  interval,  we  will  assume  a  functional 
dependence  of  the  cost  to  go  matrices  on  their  boundary  conditions,  similar 
to  the  technique  used  on  the  costates.  Since  the  cost  to  go  equations  are 
linear,  we  can  find  such  a  functional  dependence.  It  will  be  independent 
of  the  changing  boundary  conditions  and  can  therefore  be  presolved.  The 
solution  of  the  function  will  be  valid  for  each  interval. 


For  notational  convenience,  we  will  "stack"  the  columns  of  the  cost 
to  go  matrices  so  that  the  matrix  equations  (2.45)  through  (2.47)  can  be 
written  as  vector  equations.  Let  c^k  be  the  vector  corresponding  to  the 
matrix  C ik.  Define  c^  as 


~  A  U 


Ci2  * 


(2.61) 


Then  (2.45)  through  (2.47)  can  be  rewritten  as 


c.  »  A. c.  +  b. 
i  1  i  1 


(2.62) 


where  the  matrix  A.  and  the  vector  b.  are  known  from  the  coefficient 
i  i 

matrices  of  (2.45)  through  (2.47).  We  can  now  solve  for  the  functional 
dependence  of  the  solution  of  (2.62)  in  the  jth  interval  on  the  boundary 
condition  c^t'^).  Actually,  since  ci2^tj+l^“°  and  ci3^tj+l^*0,  we 
need  only  assume  dependence  of  the  solution  on  !•«•»  for 

t  €  [tj,tj+1)  assume 


ci<t)  *  Mi(t)cil(tj+1)+di(t). 


From  (2.62)  and  (2.63)  it  follows  that 


(2.63) 


Mi  *  AiMi’  Mi(tj+1) 


(2.64) 


di  “  Aidi  +  bi’  di(tj+l)  “° 


(2.65) 


where  the  dimension  of  the  identity  matrix  in  M(tj+p  fs  the  same  as  the 
dimension  of  c^. 

If  the  system  is  time  invariant  and  if  the  sampling  rate  is 
constant  then  (2.64)  and  (2.65)  need  be  solved  only  once  over  one  sampling 
interval.  In  fact,  only  the  value  of  M^(t+)  and  d^(t^)  need  be  stored  since 
we  only  need  c. (t+)  in  terms  of  c, .(t  .).  That  is 
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c>+)  -  V^il^j+l5  +  (2.66) 

where  M^(t^)  and  d^(t+)  are  the  same  for  all  j. 

Due  to  the  relationship  (2.63),  we  will  not  have  to  solve  the 
cost  to  go  equations  (2.45)  through  (2.47)  repeatedly  for  each  sample 
interval  but  need  only  plug  into  (2.66). 

2.4.5.  Summary  of  Algorithm 

We  will  now  summarize  the  required  calculations  in  the  following 
flow  chart.  The  major  steps  and  reference  to  the  related  equations  are 
given  in  the  order  in  which  they  must  be  computed. 

All  integrations  are  performed  over  only  one  sample  interval  if 
the  system  is  time  invariant. 


Going  backwards  from  j  *r-l  to  j  ■  l,  beginning  with  the  known 


cil(tf)  from  (2.48),  the  following  calculations  must  be  done  for  each  j  in 
order  to  obtain  the  boundary  conditions  for  each  interval. 


! 

I 
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I  until  we  have  I 


2.4.6.  Implementation 

The  controls  can  now  be  implemented  forward  in  time.  They  are 
found  by  (2.19),  (2.22),  the  definitions  of  y3,  i.e.,  y3*(\'  •  P[  •  •••  •  P^) 
and  y^-x,  and  the  evolution  of  y3(t)  in  each  interval,  tt[t^,tj+^)  given 
by 


y3(t)  -  P(t)y1(tj) 


(2.67) 


where 


P(t)  -  [G1(t)(H1(t)+H3(t)Ej)+G3(t)Ej]  (2.68) 


which  is  derived  from  (2.39),  (2.40)  and  (2.58). 
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If  P(t)  is  broken  up  as 


P(t) 


Po<t> 

P^t) 


p_(^) 

m 


where  each  block  P^t)  is  n  by  n,  then  the  ith  control  during  the  jch 
interval  is 

ui(t)  *  ‘RIiBiPi(t)x<tj>* 


As  outlined  above,  there  are  a  number  of  equations  to  be 
integrated,  some  of  which  are  of  large  dimension.  These  integrations,  how¬ 
ever,  are  done  once  only  and  are  performed  over  a  period  equal  to  the 
length  of  only  one  sample  interval.  Thus,  as  the  number  of  samples  taken 
increases,  the  computational  burden  is  reduced.  Computationally  the  only 
limiting  factor  which  prohibits  us  from  allowing  the  length  of  the  sample 
intervals  to  become  arbitrarily  small  is  the  corresponding  increase  in  the 
number  of  matrix  inversions  which  must  be  performed  at  the  sampling  times 
in  order  to  generate  the  required  boundary  conditions  for  each  interval. 
That  is,  as  the  period  of  integration  becomes  smaller,  these  matrix  inver¬ 
sions  will  tend  to  become  the  dominant  computational  burden.  The  matrix 
inversions  present  another  difficulty  since,  in  general,  we  are  unable  to 
guarantee  their  existence. 

2.4.7.  Comparison  of  Techniques 

The  first  technique  discussed  at  the  beginning  of  this  section  is 
a  method  for  converting  the  problem  of  repeatedly  solving  m  coupled  matrix 
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Riccati  equations  to  that  of  solving  m  uncoupled  matrix  Riccati  equations 
providing  significant  computational  savings.  These  equations,  however, 
must  still  be  solved  repeatedly  for  each  sample  interval  with  only  a  change 
in  the  boundary  conditions . 

The  second  approach  requires  a  set  of  linear  and  Riccati  equations 
to  be  solved  once  only  over  a  horizon  which  is  the  length  of  only  one  sample 
interval.  The  computational  advantage  of  this  second  technique  is  due  to 
the  fact  that  the  integrations  are  performed  over  only  one  sample  interval 
which  is,  in  general,  considerable  shorter  than  the  time  horizon  of  the 
original  problem. 

The  second  approach  has  an  advantage  over  the  first  approach  due 
to  the  fact  that  the  equations  which  are  to  be  solved  in  the  second 
technique  are  solved  only  once  for  a  period  equal  to  one  sample  interval 
while  the  equations  to  be  solved  in  the  first  technique  must  be  solved 
repeatedly  during  each  sample  interval.  However,  for  a  sufficiently  small 
sample  interval  it  has  been  observed  that  the  matrix  inversions  needed  to 
generate  the  boundary  conditions  in  the  second  technique  can  become  a 
dominant  factor.  Therefore,  the  advantage  shifts  to  the  first  technique  for 
the  case  of  decreasing  sample  interval. 

2.5.  Conclusions 

In  this  chapter,  a  sampled  data  equilibrium  Stackelberg  strategy 
has  been  considered.  The  advantages  of  the  formulation  can  be  seen  by 
considering  certain  characteristics  of  the  continuous  time  Stackelberg 
problem.  The  linear  quadratic,  continuous  time,  closed  loop  Stackelberg 


problem  results  In  a  solution,  if  it  exists,  in  which  the  controls  are  non¬ 
linear  functions  of  the  state.  Furthermore,  the  Stackelberg  solution  for 
general  dynamic  games  does  not,  in  general,  satisfy  the  principle  of 
optimality.  The  principle  of  optimality  can  be  imposed  for  discrete  time 
games  but  the  procedure  for  doing  this  for  general  continuous  time  games 
has  not  been  established. 

The  sampled  data  equilibrium  Stackelberg  solution  results  in 
linear  control  laws  for  the  linear  quadratic  case.  The  advantage  of  linear 
control  laws  is  that  they  are  quite  simple  to  implement. 

In  deriving  the  sampled  data  equilibrium  Stackelberg  solution  we 
have  been  able  to  obtain  considerable  computational  savings.  That  is, 
rather  than  performing  integrations  over  the  entire  time  horizon  of  the 
original  problem,  we  are  able  to  imbed  the  subproblems  of  each  sample 
interval  into  a  more  general  formulation,  the  solution  of  which  requires 
integrations  over  a  period  equal  to  the  length  of  only  one  sample  interval. 
The  computational  technique,  an  application  of  invariant  imbedding  developed 
for  the  particular  case  of  a  Stackelberg  strategy  and  the  type  of  boundary 
conditions  peculiar  to  it,  is  quite  useful  for  many  problems,  in  particular 
for  a  variety  of  sampled  data  formulations. 
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CHAPTER  3 

ON  THE  EXISTENCE  OF  STABILIZING  SOLUTIONS  ’ 
FOR  THE  STACKELBERG  STRATEGY 


3.1.  Introduction 

In  the  previous  chapter  we  developed  an  effective  method  for  the 
coordination  of  the  decentralized  control  of  a  large  system  by  Imposing  a 
form  of  the  Stackelberg  strategy  and  exploiting  certain  characteristics  of 
the  strategy.  The  Stackelberg  strategy,  as  considered  in  Chapter  2,  is  one  of 
many  forms  in  which  it  might  arise.  Generally  it  is  of  interest  either  as  a 
control  strategy  to  be  imposed  on  a  given  problem,  such  as  for  coordination 
purposes,  or  it  may  arise  naturally  wherever  a  precedence  relationship  exists 
among  the  controllers. 

While  prescriptive  approaches  to  the  design  of  controllers  which 
satisfy  the  Stackelberg  strategy  have  been  developed  for  many  forms  of  the 
strategy,  little  is  known  about  the  existence  of  such  control  laws  or  if  the 
system  under  their  control  will  be  stabilized.  In  this  chapter,  we  will 
address  the  problem  of  the  existence  of  stabilizing  solutions  for  controller, 
which  are  obtained  according  to  a  Stackelberg  strategy. 

In  order  to  consider  a  simple  form  of  the  Stackelberg  strategy  we 
will  examine  the  problem  of  a  linear  system  being  controlled  by  two  controllers 
where  the  control  laws  are  constrained  to  be  in  the  form  of  linear,  time- 
invariant  state  feedback.  The  cost  functions  are  assumed  to  be  defined  over 
an  infinite  horizon.  It  is  known  [6]  that  in  general  there  is  no  optimal 
linear  control  law  for  the  leader  so  we  consider  the  problem  in  which  the 
leader's  cost  function  is  modified  in  order  to  average  out  the  dependence  of 
the  solution  on  the  initial  condition.  This  formulation  is  well  posed  with 
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respect  to  linear  solutions.  The  necessary  conditions  for  this  problem  have 
been  derived  in  [6]. 

Our  interest  in  the  problem  is  in  finding  out  if  there  exists  a 
stabilizing  solution  and  if  so,  presenting  conditions  under  which  a  stabi¬ 
lizing  solution  can  be  guaranteed.  This  particular  form  of  the  Stackelberg 
strategy,  i.e.,  linear  state  feedback,  is  considered  because  it  allows  us  to 
use  concepts  from  linear  systems  theory  in  approaching  the  problem. 

In  Section  3.2  we  will  introduce  the  concepts  needed  to  establish 
the  main  result  which  is  presented  in  Section  3.3. 

3.2.  Background 

It  will  be  assumed  throughout  that  we  are  dealing  with  a  linear 
time  invariant  system  and  linear  time  invariant  control  laws. 

3.2.1.  Controllable  subspaces 

For  the  multi-controller  case,  the  individual  controllable  sub¬ 
spaces  are  not  invariant  with  respect  to  feedback.  For  example,  for  the  two 
controller  case 

<  A+BjFje^  *  <A|82> 

in  general,  where  and  where 

<  A|B>  -  B  +  A6  +  •  •  •  +  AN_;L8 

where  N  is  the  dimension  of  the  system  and  8  “61(B).  So  the  controllable  sub¬ 
space  of  one  controller  can  be  altered  by  feedback  by  another  controller. 

The  jointly  controllable  subspace 


< A[ 61>  +  < A|82> 
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say,  is  invariant  with  respect  to  feedback.  This  is  true  for  any  number  of 
controllers . 

For  the  two  DM  case  we  can  denote 

R^Fj)  -  <  A+BjFj  IBj)  iJ*  j  >  i-1,2 

and  the  following  is  readily  verified. 

Lemma  3.1:  For  i*l,2  the  subspace  depends  on  F ^  ,  j^i,  and  does  not  depend 

on  F. . 
x 

Thus  the  notation  R^F^),  j^i  is  justified. 

If  we  define  R^  to  denote  the  space  perpendicular  to  R^  then,  for 
given  F1  and  F2; 

R^(Fj)  is  the  smallest  (A+B^ F^ )-invariant  subspace  containing 
i*j,  i-1,2. 

Rj (F^)  is  the  largest  (A+B^F^ '-invariant  subspace  contained  in 
7UBj),  i^j  ,  i-1,2,  where  72(A)  -  null  space  of  A. 

The  following  subspace  definitions  will  be  useful.  The  system  triple 
(A,B^,B2)  uniquely  determines  the  subspaces  defined  as  follows; 

R*  :  largest  subspace  ^  such  that  yCR^F^)  for  all  F ^  ,  jj*i. 

The  R*  can  be  thought  of  as  the  greatest  lower  bound  (in  the  sense  of  sub¬ 
space  Inclusion)  for  the  set  of  subspaces  R^(Fj)  over  all  F ^ .  The  definition 
of  these  subspaces  is  invariant  with  respect  to  feedback,  i.e.,  they  are  the 
same  whether  we  consider  the  system  (A,B^,B2>  or  ( (A+B^F^+BjF^ ,B^,B2>  for 
any  F^Fj. 

3.2.2.  Criterion  Subspaces 

For  the  quadratic  criterion  function 

Ji  “  ,/j  (x,Qix  +  ^^iujRijuj)dt 


4 


£ 


* 


and  the  system 


x  ■  Ax  +  E  B.u.  x(t  )  ■  x 
j-1  j  j  'o'  o 


ui  “  Fix- 


the  criterion  subspace  is  defined 


where 


£i<Fj;j-l,...,m)  S(A’|CJ>  +j?i<A,|31^> 


sS  <i+ JLw 

Ci  “  «<C[> 

'U  •  «(FjR,iV 


CiCi  ”  Qi>0,  Rij  >  °»  Rii>0  i.J"1 . “• 

If  x0ei?i»  xq#0,  then  Ji>0  (possibly  infinite).  We  say  that  the  subspace 
is  observable  through  the  criterion  function  J^. 

The  criterion  subspaces  have  the  following  property. 

Lemma  3.2: 

m  j.n  m 

for  all  Fj  and  where  is  the  open  loop^  with  F^  =0,  j«l,...,m. 

The  proof  of  this  follows  in  a  straightforward  way  from  Theorem  3.6 

of  [22]. 

Lemma  3.2  tells  us  that  observability  is  preserved  under  feedback. 
Another  direct  consequence  of  Theorem  3.6  of  [22]  is  the  following: 
Lemma  3.3:  If  for  some  i,  the  R^ ,  J-l,...,m  are  all  positive-definite, 
R^>0,  and  the  system  is  detectable  through  ^  ,  i.e.,  the  open  loop 

when  all  feedback  gains  equal  zero,  then  the  system  will  remain  detectable 


through  J^F^;  j*l,...,m)  for  all  F^ . 
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If  we  define  ^  to  be  the  subspace  perpendicular  toj?^  then,  for 
nd  F^,  in  the  two  controller  case,  for  R^^O,  j^i, 

*S  t*ie  sma^-^est  (A+B2F2) '-invariant  subspace  containing 


«(C{) 


and 


^2(fi«0)  is  the  largest  (A+B^F^) -invariant  subspace  contained  in 


7 Z(C2) . 


The  subspaces  ^(^i*®)  an<^  *1^**2^  can  defined  similarly. 

At  this  point,  we  need  to  be  familiar  with  the  following  concepts 
of  (A,B)-invariant  subspaces  [22]. 

A  subspace  J  is  (A,B)-invariant  if  and  only  if  there  exists  an  F 

such  that 

(A+  BF)  JCJ, 


If  S  is  a  set  of  subspaces  then  the  supremal  subspace  V*  of  the  set  S  is 
defined  as  the  subspace  J  such  that^6S  and  for  every  V'eS,  2  ZJ .  If  the 
supremal  subspace  exists,  it  is  unique. 

For  some  subspace  Q,  the  set  of  all  ( A, B) -invariant  subspaces 
contained  in  Q  always  has  a  supremal  element.  With  this  background  we  can 
make  the  following  observation. 

Lemma  3. A:  There  exists  unique  and  such  that 

;i(F1,F2)  C  l*  for  all  F^, 

i.e., 

^2<F1,F2)  C  '2(F1’0)  C  ^2 

JiOvV  c/!<0,f2) 

where  is  the  supremal  ( A,  B^)- in  variant  subspace  contained  in7](Cj),  j^i. 


which 
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with 


E{x  }  ■  0  and  E(x  x'}  *  X  >0. 
o  o  o  o 

The  necessary  conditions  for  this  problem  were  derived  in  [6]. 
conditions  for  the  infinite  horizon  problem  are  as  follows 


The 


F2  -  -R22B2K2  *  K2  "  M2 

a'k2  +  k2a  +  k2s22k2  +  f;r21f1  4.  q2  .  0 

+  V  +  K2S12K2  +  F1R11F1  +  Ql  -  0 

V  +  an2  -  s22miNi  -  niMis22  +  s12m2Ni  +  Wn  -  0 

N.a'  +  AN,  +  X  -  0 
1  1  o 

R21F1N2  +  RllFlNl"Bl(M2N2  +  MlNl)  "  0 


where  the  last  equation  is  solved  for  F^. 

We  would  like  to  answer  the  question:  Under  what  conditions  can 
we  guanratee  that  there  will  exist  a  solution  (F*,F*)  such  that  the  resultant 
system 

i  -  (a+b1fJ  +  b2f*)x 


is  asymptotically  stable? 

For  a  given  F^,  the  follower  is  faced  with  a  conventional  optimi¬ 
zation  for  which  it  is  known  [23]  that  if  the  triple  (C2,  A+B^F^,B2)  is 
stabilizable  and  detectable,  then  there  will  exist  a  unique  optimal  F2  and 
that  the  system  matrix  (A+B^F^+BjF*)  will  be  stable.  So,  we  ask  under  what 
conditions  does  there  exist  an  optimal  F*  and  if  it  exists,  under  what 
conditions  will  it  be  chosen  such  that  (C2,A+B^F*,B2)  will  be  stabilizable 


and  detectable? 
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Theorem  3.1:  Existence  of  stabilizing  solution. 

We  assume  that  the  system  is  jointly  controllable 

R^O)  +  R2(0)  -  RN 

and  we  assume  that  R^2>0  the  leader  has  a  positive  definite  penalty  on  the 
follower's  control  action.  If 

i)  The  system  is  observable  through 
and 

ii)  £  Rj 

then  there  exists  an  optimal  F*  and  an  optimal  F*  and  the  resultant  closed 
loop  system 

(A  +  BjFj  +  B*Fj) 

will  be  asymptotically  stable. 

Before  proving  Theorem  3.1,  some  preliminary  results  are  needed. 
After  DM^  applies  feedback,  the  controllable  subspaces  are  R^(0) 
and  R2(F^).  We  have  assumed  joint  controllability  so 

R^O)  +  R2(Fx)  =  RN. 

Note  that  (A+B^R^CF^C^CF^),  i.e.,  R2(F^)  is  (A+B^F^-invariant. 

Define  the  factor  space 

x  -  rn/r2(f1). 

This  space  is  isomorphic  to  R^(F^)  where  R^(F^)  is  defined  as  follows: 

If  Rq(F^,F2)  ■  R^(F2) n r^(f^)  then  let  R^(F^)  be  any  subspace  such  that 

Ri(V  "  Ro(Fl’F2)0Ri(Fi)  j’‘i- 

Since  X  is  isomorphic  to  R^(F^),  i.e., 

RV^)  *  R^) 
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let 


^1  “  (ei+R2)/R2*  h  "  fl(V 


and  let  A^  be  the  map  induced  by  A  on  X.  Then, 
Lemma  3.5: 


<Au|6i> 


X**  R„ 


Proof :  Proposition  1.2  of  [22]. 

Now,  corresponding  to  R^(F^)  an<*  R2^F1^  t^iere  is  a  basis  such 
that  the  matrix  (A+B^F^)  will  be  of  the  form 


11 


21 


22 


where 


22 


(A+B^lRj 


A11P  *  KA+BjFj) 


where  P  is  the  canonical  projection. 


P  :  RN  -*■  X. 


So  from  Lemma  3.5  we  see  that  the  eigenvalues  of  A^  can  be  placed 
arbitrarily  by  DM^. 

In  the  sequel,  when  referring  to  the  system,  we  mean  (C^A+B^F^^) 
unless  noted  otherwise. 

We  will  use  the  notation  x+(A)  and  x(A)  defined  as  follows: 

X+(A)  is  the  space  spanned  by  the  eigenvectors  corresponding  to  the  unstable 
eigenvalues . 

X  (A)  is  the  space  spanned  by  the  eigenvectors  corresponding  to  the  stable 
eigenvectors. 
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The  following  lemmas  will  also  be  needed. 

Consider  Che  sec 

:  {F1|J1(F1,f2(F1))  <  k} 

where  f2(f^)  represenCs  Che  reacCion  of  decision  maker  two  to  the  controls 
of  decision  maker  one.  That  is,  f^CF^)  is  the  implicit  mapping  defined  by 
the  optimization  performed  by  decision  maker  two. 

Lemma  3.6:  The  mapping  F2  *  f2(F^)  cont^nuous  °ver  the  set  of  F^  for  which 
the  triple 


(c2,(a+b1f1),b2) 

is  stabilizable  and  detectable. 

Proof :  The  mapping  F2=f2(F^)  is  ^e^ine<^  implicitly  by  the  solution  for  K2 
of  the  Riccati  equation 

0  =  (A+BjFj)'!^  +  ^(A+B^) -K2B2R"2B2K2  +  Q2  +  F^R^ 

and 

F2  .  -R-‘b'k2. 


The  partial  derivative  of  the  Riccati  equation  with  respect  to  its  solution 
K2  is 


(ISA1)  +  (A'«  I) 

where  ®  is  the  Kronecker  product  (i.e.,  A&B*  (a^B))  and  where 
A  -  (A+B1F1+B2F2) . 

If  F^  is  such  that  the  system  is  stabilizable  and  detectable  then 
A  will  be  stable.  A  characteristic  of  Kronecker  products  is  that  if  A  is 


stable  then  so  is 


CI»A'>  +  (A' 8  I) 


and  since  it  has  no  eigenvalues  A*0,  it  is  nonsingular. 

By  the  Implicit  Function  Theorem,  if  a  function  f(x,y)  is  dif¬ 
ferentiable  with  respect  to  x  and  y  and  if 


where 


0  -  f(x,y) 
f  :  X  x'Ly*’  X 


then  there  exists  a  neighborhood  N  of  f  in  ^  over  which  a  continuous, 
differentiable  (implicit)  function  g(y)  is  defined  such  that 

0  *  f(g(y)  ,y) ,  ye  N 


t —  is  nonsingular, 

dx  — 

1  X 

Therefore  the  function  F2*f2(E’1)  is  continuous  (and  differentiable)  over  the 
set  of  F^  for  which  the  system  is  stabilizable  and  detectable. 

The  set  is  a  subset  of 


S2  ”  tFl^l(Fl,EV  k;  for  any  F2} 


Lemma  3.7:  The  set  is  a  subset  of  5^  for  a  given  k,  a  sufficiently 

large  f  and  a  sufficiently  small  e >  0  where 

53  :  {FjJll  FjJI  <  f;  Re{A(A)}  <  -e,  e>0,  for  any  F2). 

Note  that  5^  is  closed  and  bounded. 

Proof :  For  our  case  where  we  have  >  0  and  >  0,  Lemma  3.7  follows  from 
Lemma  3.8  which  we  will  prove  in  detail.  (Lemma  3.8  is  in  a  more  convenient 
form  to  work  with.) 
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Lemma  3.8:  For 


x  ■  Ax  +  Bu 


J  =  \  E  {/  (x'Qx  +  u'Ru)dt} 

X°  *o 

R  >0 ,  E{xq}  =*  0,  E{xqx^}  *  1 

C'C  «  Q>  0 

(C,A);  observable 

(A,B) ;  scablllzable 

3a  -  { F |  J (F)  <  k} 

*  { F | II  F II  <  f;  «e(X(A+BF))  <  -e}. 

For  a  given  k,  there  exists  a  sufficiently  large  f<»  and  a  sufficiently 
small  e>0  such  that 


A  similar  result  for  the  case  of  Q>0  has  been  obtained  in  [25] 
for  output  feedback. 

Proof  of  Lemma  3.8:  For  any  F  for  which  J(F)  is  finite  the  system  will  be 
stabilized  and  the  cost  will  be 


J(F)  -  tr(L) 

where  L  satisfies 

LA  +  a'L  +  Q  +  F'RF  -  0  (3.1) 

where  A  ■  (A+BF) .  A  *  (A+BF). 

By  taking  the  norm  of  equation  (3.1),  we  establish  that  F€3A  implies  that 


II Fit  <  f 


where 


where 


-Mi-s 


a  -  min{tl  F'RF  11}  >  0 
F 


(3.2) 


IIFII  -  1 


b  a  2f  *11 B  I 


c  a  2f.il All  +  II Q ||  . 

It  remains  to  show  that  F€5^  implies  the  existence  of  an  e>0  such  that 


Define 


Notice  that 


fle(A(A+BF))  <  -e. 


Q  -  Q  +  F'RF 


Q(F)  -  Z  (A+BF)  Q(A+BF)J 
i-0 


(Rd1  c')  *  «(A1’c'CAi) 


««  A'|C’»  =  <R(Q(  F)  ) 


Q(F)  >  0. 


If  we  pre-  and  post-multiply  equation  (3.1)  by  A  and  A  respectively,  and 
sum  these  equations  over  i-0,1, . . . ,m-l,  we  have 


where 


LA  +  A'L  +  Q  -  0 


-  A  m-l_i '  -i 
L  a  I  A1  LA1. 
i-0 


Since  Q>0,  the  set  IlFli  <  f  is  closed  and  bounded  and  since  the 
eigenvalues  of  a  matrix  depend  continuously  on  the  elements  of  the  matrix, 
there  exists  a  q  , 

‘mi  n 
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0  <  q  .  ”  min  min(X  (Q)). 

JIFII  <  F  i  1 


Notice  also  that 


m 


I L 11  <  i£1xi(D  “  tr(L) 


for  any  L  *  L' > 


L  -  L'  >  0 


where 


Thus 


I  L II  ■  sup  |  Lx  |  m  X  (L)  . 
I  xl  -1  maX 


II L II  < J(F)  -  k 


for 


Fe  if. 


and 


ILII  <  k-mE1(llAll  +  llBH  •  f)2i  -  l. 
i=0 


If 


then 


fle{X(A) }  <  -e. 


A  sufficient  condition  for  is  that 


I  L II  <* 


2c 


therefore  for  F6i>.  we  have 

A 


«e{A(A+  BiO }  <  -e 


for  c  such  that 


n  q°ln 

°  v  £^—  . 

So,  for  f  as  in  (3.2)  and  e  from  (3.3)  we  have 


5acSb* 


(3.3) 
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The  proof  of  Theorem  3.1  will  be  done  in  three  parts. 

First,  we  show  that,  under  the  conditions  ef  the  theorem,'  there 
exists  a  leader's  control  that  will  make  the  system  stabilizable  for  the 
follower  and  that  in  order  for  to  be  finite  the  leader  must  choose  such 
that  the  follower  will  stabilize  the  system. 

Second,  we  show  that  there  exists  a  leader's  control  such  that  the 
system  is  detectable  by  the  follower  and  that  for  such  an  F^,  the  follower  will 
stabilize  the  system. 

The  first  two  parts  will  establish  that  there  exists  an  F^  such  that 
and  are  finite  and  we  establish  that  the  control  gains  considered  by  DM^ 
can  be  restricted  to  a  set  for  which  the  follower  will  then  be  faced  with  a 
stabilizable  and  detectable  problem.  In  the  third  part  we  show  that  the 
leader's  optimization  can  be  considered  to  be  over  a  closed,  bounded  set  on 
which  the  follower's  control  depends  continuotjsly  on  the  leader's  control  and 
so  the  leader's  cost  function  will  be  continuous  in  F^  over  this  set.  These 
conditions  are  sufficient  to  establish  the  desired  results. 


Part  1 


By  Lemma  3.5  we  know  that  there  exists  an  F^  such  that  the  system 


is  stabilizable  by  the  follower.  Now  we  show  that  the  leader  must  make  the 
system  stabilizable  to  have  a  finite  J^. 

Ue  can  express  as 


where 


J.  ■  lh  E{lim(x'V(t)x  ) } 
1  t-*»  0  o 


V(t)  -  /CeSA  (C,'C,  +F,'F,  +F'F-)esAds 
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with  A»  (A+B^F^+BjF^)  and,  without  loss  of  generality,  assume  R^  *  I  and 
R.22  "  I.  If  A  is  unstable  let  u  be  an  eigenvalue  with  Re(u)>0  and  let  x  be 
the  corresponding  eigenvector. 

Then 

x'Vx  -  /te2sRe(u)(|Cx|2+ |F.x|2+ |F  x|2)ds. 

0 

If  is  to  be  finite  then  this  integral  must  be  bounded  as  t-*».  For  the 
integral  to  be  bounded,  we  must  have 


Cx  ■  0,  F^x  »  0,  and  F2x  »  0 


so, 


'A1”1*  -  u^1 


CA  x  -  y  ‘Cx  -  0 
FjA1’^  -  /"VjX  -  0  i-l,...,n 

J-1,2 

but  this  implies  that 

x€^1(F1,F2). 

This  must  be  true  for  all  unstable  eigenvalues  of  A,  therefore 

X+(A)  C  J1(F1,F2) 

but  if  condition  i)  holds  then  by  Lemma  3.3 


h(FVe2>  m  ♦ 

X+(A)  -  * 


which  for  any  F2  is  equivalent  to 

X+(A+B1F1)  C  R2(F1). 

2 

Thus  (A+B^F^ ,B2)  is  stabilizable  by  DM  .  So  we  have  shown  that  if  is  to 
be  finite  then  DM^  must  choose  F^  such  that  (A+B^F^.Bj)  is  stabilizable. 
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Part  2 

There  exists  an  such  that  Re(A(A|R^(0)) < 0  and  so 

R1(0)nx+(*«1?1)  -0. 

But  by  condition  ii) 

J 2^,0)  C  Rj  C  R1(0). 

Therefore 

X+(A+B1F1)  n  ^2(?lt0)  -  4, 

i.e.,  the  system  is  detectable  for  such  an  F^.  The  system  is  also  stabi- 
lizable  for  such  an  F^,  therefore  the  follower's  resultant  control  ?2  will 
cause  the  system  (A+B^F^+B^f^)  to  be  asymptotically  stable  and  J^<®  and 

J2<®. 

Part  3 

In  the  previous  sections  we  have  established  that  there  exists  an 
F^  such  that  the  triple 

(c2,(a+b1f1),b2) 

will  be  stabilizable  and  detectable  and  that  in  order  for  the  leader  to  have 
a  finite  cost,  such  an  F^  must  be  chosen.  So,  the  optimal  F*,  If  it  exists, 
must  make  the  system  stabilizable  and  detectable.  It  remains  to  establish 
that  a  minimizing  control  exists.  If  we  can  establish  that  the  minimizing 
control,  f£,  if  it  exists,  will  be  contained  in  a  closed  bounded  set  3  and 
that  the  leaders  cost  function  is  continuous  with  respect  to  F^  over  the  set  3 , 
they  by  the  Weierstrass  theorem  the  minimum  is  attained  in  3,  i.e.,  F* 
exists  [24]. 


From  this.  Lemma  3.7  follows.  From  Lemma  3.7 


3,  C  «i  C  3 
1  2 


3 


and  Che  optimization  can  be  done  over  a  dosed,  bounded  set  over  which  the 
cost  function  is  continuous  in  F^.  By  the  Weierstrass  theorem  an  optimal 
stabilizing  solution  exists. 


3.4.  Verification  of  Conditions 

Most  of  the  conditions  for  the  existence  of  a  stabilizing  solution 
can  be  tested  by  well  established  techniques.  The  controllability  of  (A,B) , 
where  B^B^jB^  is  a  concatenation  of  B^  and  B^,  is  readily  checked  as  well  as 
the  positive  definiteness  of  the  various  matrices.  Checking  the  condition 

n <= h 

however  deserves  some  further  discussion. 

The  subspace  is  the  supremal  (A.B^)-invariant  subspace  contained 
in  T&Cy) .  Algorithms  for  calculating  a  set  of  vectors  which  span  a 
supremal  subspace  have  been  considered  by  a  number  of  authors,  most  notably 
in  [26],  where  attention  is  paid  to  the  computation  reliable  components  of 
supremal  subspaces  by  algorithms  whose  stability  and  efficiency  can  be  insured. 

The  subspace  r£  is  defined  as  the  largest  space  tf  such  that 
afCR^F^)  over  all  possible  F^.  That  is,  R*  is  the  greatest  lower  bound,  in 
the  sense  of  subspace  Inclusion,  for  the  set  of  R^CF^).  The  inclusion  of 
within  R^  is  to  be  tested  for  and  so,  although  it  is  not  clear  how  to 
efficiently  calculate  r£  exactly,  any  computed  subspace  of  R*  for  which  the 
Inclusion  holds  is  sufficient  to  establish  the  desired  result.  Notice  that 
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unlike  J|,  there  does  not  in  general  exist  an  F2  such  that  R^F^-R*. 
Relationships  that  do  hold  and  are  useful  are 


„uwc«isnw 


We  might  consider  finding  the  K^(F^)  of  maximum  dimension.  The  maximum 
dimension  of  R^(F^)  over  all  F^  is  unique  but  there  is  no  unique  R^(F^) 
with  maximum  dimension.  The  calculation  of  a  R^CF^)  of  minimum  dimension  (not 
unique),  and  thus  the  corresponding  R^(F^),  can  be  done  by  the  index  and 
decomposition  algorithm  of  [27].  If  is  contained  in  one  such  maximum  R^(F^) 
then  this  is  sufficient  to  establish  the  result.  The  union  of  a  finite 
collection  of  arbitrarily  generated  subspaces  R^(F^)  might  also  be  considered. 
As  a  check,  for  a  finite  set  of  arbitrary  F^  and  a  candidate  space  <^C  r*5  if 


J  •  ns  (F-) 
finite  *  2 

set  of 


(3.4) 


then  R^  since  the  right  hand  side  of  (3.4)  is  an  upper  bound  for  R*. 


3.5.  Conclusions 

Conditions  have  been  derived  which  are  sufficient  to  insure  the 
existence  of  stabilizing  feedback  gains  which  satisfy  the  Stackelberg  strategy. 
These  conditions  are  sufficient  and  less  restrictive  conditions  may  exist. 
Although  it  is  possible  to  Insure  the  existence  of  a  minimizing  control  for 
the  leader,  the  computation  of  an  optimal  F^  has  not  yet  been  dealt  with  and 
requires  Investigation. 


CHAPTER  4 


AN  EXAMPLE  OF  THE  IMPACT  <?F  THE  INFORMATION  STRUCTURE 
4.1.  Introduction 

In  noncooperative  decentralized  decision  making,  the  information 
available  to  the  controllers  has  a  far  more  significant  role  than  in  the 
case  of  a  centralized,  or  even  decentralized,  single  objective  control  problem. 
In  Chapters  4  and  5  we  will  investigate  the  Nash  strategy  and  the  role  that 
the  information  structure  has  in  the  determination  of  the  controls  and  the 
resultant  cost  incurred  by  each  controller.  In  Chapter  4  an  example  is 
presented  and  discussed  in  which  a  decision  maker  becomes  worse  off  when 
more  information  is  made  available  to  him.  This  demonstrates  that  more 
information  is  not  necessarily  better  and,  more  generally,  it  demonstrates 
that  the  choice  of  an  information  structure  must  be  done  in  a  systematic 
fashion.  In  Chapter  5  such  a  systematic  approach  is  developed. 


4.2.  An  Example 

A  problem  in  which  there  are  many  controllers,  each  having  a 
different  objective,  can  be  formulated  as  a  differential  game  with  the 
controllers  acting  according  to  a  particular  strategy.  In  a  decentralized 
problem,  where  each  controller  has  different,  incomplete  information,  the 
information  structure  can  have  a  significant  and  sometimes  surprising  impact 
on  the  solution. 

We  examine  a  fairly  realistic  problem  of  a  two-area  electric  power 
distribution  system  in  which  the  two  area  controllers  determine  constant 
output  feedback  gains  according  to  Nash  strategy.  An  example  demonstrates  a 
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situation  in  which  one  controller  is  worse  off  when  more  information  is  made 
available  to  him. 

This  phenomenon  has  been  noted  previously  [28],  [29],  and  [30] 

for  static  and  one-step  dynamic  systems  where  the  amount  of  information 

is  characterized  in  terms  of  the  statistics  of  noisy  measurements.  We 

consider  a  differential  game  where  the  controllers  apply  output  feedback  of 

perfect  measurements.  The  amount  of  information  is  characterized  in  the 

following  sense.  Decision  maker  i  (DM^  measures  y^ *  C^x^  where  x  is  the 

state  of  the  system  and  is  a  matrix  of  appropriate  dimension.  We  say  that 
1  2 

y^*C^x  is  more  informative  than  y^^  ■  if  fl(C^)  where  fl(A)  : 

1  2 

range  space  of  the  matrix  A,  i.e.,  y^  is  composed  of  the  measurements  y^  (to 
within  an  isomorphic  transformation)  plus  additional  linearly  independent 
measurement (s) . 

The  effect  of  the  information  structure  seems  counter-intuitive 
at  first,  but  will  De  readily  understood  once  the  significance  of  the 
"availability"  of  information  is  explained  in  terms  of  the  strategy  being 
employed . 


4.3.  The  System 

We  consider  a  two  area  electric  power  distribution  system  with  a 
tie-line  interconnection.  The  model  is  based  on  [31].  Each  area  has  a 
steam  plant  and  is  modeled  by  a  fifth  order  system,  the  states  of  which  are 
the  deviations  of  the  area  frequency,  the  actuator  position  and  the  power 
outputs  of  a  high  pressure  turbine,  an  intermediate  pressure  turbine  and  a 
low  pressure  turbine.  The  two  subsystems  along  with  the  tie-line  power  flow 
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comprise  an  eleventh  order  system.  The  load  disturbance  of  each  area  is 
modeled  by  a  first  order  system  so  the  combined  power  and  disturbance  systems 
comprise  a  thirteenth  order  model. 

The  model  is  the  two  interconnected  system  model  for  steam  powered 
plants  derived  in  [31].  The  model  is  a  linearization  of  the  system  about  an 
operating  point,  describing  the  system  behavior  under  real  power  and  frequency 
variations.  The  state  vector  is 

x^  -  valve  displacement  -  area  one 

-  power  displacement  of  high  pressure  turbine  -  area  one 
x^  -  power  displacement  of  intermediate  pressure  turbine  -  area  one 

x^  -  power  displacement  of  low  pressure  turbine  -  area  one 

Xg  -  frequency  deviation  in  area  one 

x,  -  tie-line  power  flow  deviation  -  from  area  one  into  area  two 
o 

x?  -  valve  displacement  -  area  two 

Xg  -  power  displacement  of  high  pressure  turbine  -  area  two 
Xg  -  power  displacement  of  intermediate  pressure  turbine  -  area  two 

x^q  -  power  displacement  of  low  pressure  turbine  -  area  two 

x^  -  frequency  deviation  in  area  two 
x^2  ~  load  disturbance  in  area  one 
x^g  -  load  disturbance  in  area  two. 

The  controls  are 

u^  -  set  point  adjustment  in  area  one 
U2  -  set  point  adjustment  in  area  two. 

In  case  one,  DM^  measures  x^  and  in  both  cases,  DM2  measures  x^.  The 
system  can  be  represented  as 
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Q1 

Q1 


2,2  8,8  | 

-  Q,  “  -28  > 

3.3  ^9,9 

*  Q,  *  .42, 

4.4  ‘‘lO.lO 

-  Q,  -  100. 

5.5  *11,11 


Unit  penalty  on  total  area  power  generation. 


The  Nash  game  will  determine  the  gains  of  each  of  the  area 
controllers.  We  assume  that  in  the  steady  state,  each  DM  will  meet  the  load 
demand  in  his  own  area,  i.e.,  the  steady  state  power  generated  in  area  i  is 
equal  to  the  steady  state  load  in  area  i.  For  simplicity,  the  feedforward 
gain  from  the  area  load  disturbance  is  calculated  such  that  if  a  step 
increase  in  load  were  to  occur  then  the  controller  for  that  area  alone  would, 
in  the  steady  state,  meet  the  new  demand.  Thus  these  feedforward  gains  are 
calculated  from  algebraic  steady  state  conditions  and  are  not  considered  as 
control  variables  in  the  Nash  calculations. 

The  problem  faced  by  each  of  the  DMers  is  to  minimize  his  average 
steady  state  cost  when  the  system  is  subject  to  constantly  varying  load 
disturbances . 

The  overall  system  is  of  the  form 


x  ■  Ax  +  B^u^  +  ®2u2  + 

where  r  , 
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CN 
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f-H 
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_  z 
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_  o  _ 
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The  dimensions  of  these  submatrices  correspond  to  the  dimensions  of  x  and  z, 
where  x  is  the  system  state,  z  is  the  load  disturbance,  and  v  is  a  white  noise 
process  with  zero  mean  and  covariance  V. 
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For  i*l ,2  DM^  has  measurement  y  *  C^x  and  will  apply  linear 
output  feedback  u^  =  -F^y^.  DM^'s  cost  function  can  be  expressed  [25] 

J.  *  lim  1/lE{x(t)'Q,x(t)  +  u,(t)'R,,u.(t)} 


t-H»  V 

which,  with  feedback  is 


t-*» 

If  we  define  the  matrix 


J.  -  lim  l,2  E{x(t)  '  (Q  +  C'F'R  F. C. )x(t) } . 

1  V  1  1  1  11  1  1 


S  “  lim  E{x(t)x(t)'} 
t-*»  v 


then 


Ji  *  l/2tr{S(Qi+C^F^RiiFiCi)}. 


The  feedback  gains  F*  and  F*  are  Nash  equilibrium  values  if 


J.(F*  F*)  <  J  (F. ,F»)  for  all  admissible  F, 


11*2 


11’2 


J_(F*  F*)  <  J-(F*,F_)  for  all  admissible  F, 


(4.1) 

(4.2) 


2 1  * 4 2^  ^  OUU14.90XI/XC  1 2 * 

By  application  of  the  matrix  minimum  principle  (32),  [33],  and  [25], 
the  following  necessary  conditions  for  the  Nash  equilibrium  output  feedback 
gains  are  obtained,  1*1,2 


0 

0 

0 


where 


^  -  ruficisc;  -  b;pisc; 

(4.3) 

Q±  ♦  CiFiRuF1C1  +  X'P1  *  P±A 

(4.4) 

EVE'  +  AS  +  SA’ 

(4.5) 

(a-b1f1c1-b2f2c2) . 

Each  controller's  primary  concern  is  to  minimize  the  frequency 


deviations  in  his  own  area.  The  cost  functions  are  symmetric  in  the  sense 
that  each  controller  penalizes  his  own  area  frequency  deviations,  his  area 
power  generation  deviations  and  his  own  control  actions. 


In  the  example,  we  compare  the  Nash  equilibrium  solutions  for  two 
information  structures.  In  case  one,  each  DM  measures  his  own  area  frequency 
deviations,  and  in  case  two,  DM^  has  no  measurement  available  for  feedback 
and  DM2  still  measures  his  own  area  frequency  deviation. 

In  order  to  compare  solutions  for  different  information  structures, 
there  must  exist  a  unique  solution  for  each  case.  We  have  established  that 
the  Nash  equilibrium  solutions  exist  and  are  unique  for  this  problem  by  direct 
numerical  calculations  of  the  reaction  curves  of  each  controller.  In  order  to 
make  such  graphical  analysis,  it  is  necessary  to  restrict  the  number  of 
measurements  available  to  each  controller  for  feedback.  Also,  some  simplistic 
assumptions  in  defining  each  controller’s  criterion  function  are  necessary  to 
insure  uniqueness  of  the  Nash  equilibrium  solutions,  i.e.,  the  explicit 
appearance  of  a  penalty  on  the  tie-line  power  flow  deviation  in  the  criterion 
functions  would  result  in  multiple  equilibrium  solutions  under  these  particular 
information  structures.  Although  it  is  not  penalized,  the  steady  state  tie¬ 
line  power  flow  deviation  would  in  fact  be  zero  under  constant  load  distur¬ 
bances  as  a  result  of  the  constraint  that  each  controller  alone  must,  in  the 
steady  state,  meet  a  constant  load  demand  in  his  own  area  and  the  fact  that 
the  resultant  overall  system  is  stable. 

These  assumptions  regarding  the  criterion  function  and  the  number 
of  available  measurements  are  needed  to  produce  a  clear,  simple  example  and 
are  not  meant  to  accurately  represent  a  situation  that  might  be  encountered 
in  practice,  particularly  not  for  such  a  small  scale  system.  It  is  ir.  lotge 
scale  systems  in  which  such  restricted  information  availability  can  be 
expected  and  in  which  the  impact  of  the  information  structure  becomes  most 


important . 
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For  comparison  we  have  also  calculated  the  solution  for  the 
Stackelberg  strategy  for  the  information  structure  of  case  one  with  DM^  as 
leader.  With  the  Stackelberg  strategy,  the  controllers  do  not  determine 
their  controls  simultaneously,  as  with  the  Nash  strategy,  but  rather  one 
controller,  the  leader,  will  first  determine  his  control  and  announce  his 
decision  to  the  other  controller,  the  follower,  who  will  then  determine  his 
control  knowing  what  the  leader's  control  will  be.  The  leader,  in  deter¬ 
mining  his  control,  takes  into  account  the  follower's  subsequent  optimization. 
It  is  assumed  that  the  leader  will  not  deviate  from  his  announced  controls 
and  that  the  leader  knows  the  follower's  cost  function  and  is  thus  able  to 
calculate  the  follower's  reaction  to  his  controls.  For  a  given  information 
structure,  the  leader  will  do  at  least  as  well  as  he  would  playing  according 
to  the  Nash  strategy.  Further  details  and  discussion  of  the  Stackelberg 
strategy  can  be  found  in  [2],  [8],  [7],  [34],  and  [6]. 

The  solutions  are  shown  in  Figure  4.1.  The  reaction 
curves  of  the  two  DM's  are  plotted  (where  u^  ** -k^y^  ** -k^f^ ,  f*  is  the  fre¬ 
quency  deviation  for  area  i)  and  the  various  solutions  are  indicated.  RC^ 
indicates  DM^'s  reaction  curve,  N1  is  the  Nash  equilibrium  solution  for  case 
one,  N2  is  the  Nash  equilibrium  solution  for  case  two,  and  S  indicates  the 
Stackelberg  solution  with  DM^  as  leader. 

Table  1  summarizes  the  various  solutions  and  the  related  costs. 


Table  1.  Resulting  Costs  for  Example 


CONTROL  GAINS 

COSTS  (*10-5) 

DM1 

■a 

DM2 

Nash 
(case  1) 

+.40815 

+.40815 

6.29 

6.29 

Nash 
(case  2) 

0 

+1.02695 

3.49 

11.4 

Stackelberg 
DM^ :  leader 

-.199 

+1.4171 

.  -  - 

3.19 

14.49 

Going  down  the  table  we  can  see  that  DM^'s  cost  decreases  as  we 
go  from  case  one  to  case  two  and  from  case  two  to  the  Stackelberg  case.  It 
so  happens  for  this  problem  that  DM2 's  cost  is  increasing  as  we  go  down  the 
table.  This  is  not  always  the  case;  examples  can  be  constructed  in  which 
both  DM's  are  better  off  when  less  information  is  available  to  one  of  them  and 
it  is  also  possible  that  both  DMers  can  have  lower  cost  when  using  the 
Stackelberg  strategy  than  when  using  the  Nash  strategy  [2]. 

4.4.  Discussion 

One  might  reasonably  expect  that,  regardless  of  the  presence  of 
other  controllers,  if  more  information  is  made  available  to  one  of  the 
controllers,  then  that  controller  would  be  better  off.  Why,  in  case  one 
of  the  example,  could  not  DM^  simply  Ignore  the  available  information, 
reducing  the  problem  to  that  of  case  two?  The  answer  to  this,  it  turns  out, 
is  the  key  to  understanding  the  phenomenon. 

The  Nash  equilibrium  conditions,  inequalities  (4.1)  and  (4.2)  can 


equivalently  be  thought  of  as  follows.  Each  controller  is  performing  an 
optimization,  minimizing  his  cost  function  over  his  entire  set  of  admissible 
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controls,  subject  to  the  constraint  that  the  other  controller  is  performing 
his  minimization  over  his  entire  set  of  admissible  controls  (i.e.,  over  all 
linear  feedback  rules  for  the  set  of  available  measurements).  In  order  to 
have  a  Nash  equilibrium,  the  optimizations  must  be  consistent;  each  DM's 
control  must  be  the  optimal  over  his  entire  admissible  set  of  controls  given 
that  the  other  DM  is  applying  his  Nash  control,  i.e.,  inequalities  (1)  and  (2) 
must  hold.  Since  each  controller  is  assuming  that  the  other  controller  is 
optimizing  over  his  entire  set  of  admissible  controls,  they  are  each  con¬ 
strained  by  the  consistency  requirement  to  optimize  over  their  own  entire  set 
of  admissible  controls.  So,  inherent  in  the  Nash  inequalities  is  the  constraint 
that  each  DM  must  optimize  over  all  admissible  controls;  measurements  cannot 
be  ignored. 

This  requirement  for  consistency  is  what  constrains  the  controllers 
to  use  the  information  in  a  way  that  could  possibly  be  detrimental  to  all  of 
the  controllers.  Is  it  possible  for  a  controller  to  avoid  this  requirement, 
allowing  him  to  ignore  information?  Yes,  with  the  Stackelberg  strategy,  this 
is  accomplished  by  the  leader.  By  simply  allowing  for  a  precedence  of 
decision  making,  the  Stackelberg  strategy  frees  the  leader  of  the  requirement 
that  his  control  must  be  optimal  for  the  given  reaction  of  the  follower.  This 
not  only  allows  the  leader  to  Ignore  information,  if  appropriate,  but,  as 
demonstrated  in  the  example,  he  can  use  the  information  to  his  advantage. 

In  this  chapter,  we  have  presented  an  example  which  demonstrates 
that  if  a  dynamic  system  is  to  be  controlled  by  more  than  one  controller,  and 
the  controllers  are  acting  according  to  the  Nash  equilibrium  strategy,  then 
a  change  in  the  information  available  to  one  or  more  of  the  controllers  can 
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have  a  surprising  effect.  In  particular,  making  more  information  available 
to  one  of  the  controllers  does  not  necessarily  bring  about  improved 
performance. 

In  the  next  chapter  we  will  consider  techniques  for  changing  the 
information  structure  used  in  a  Nash  strategy  with  the  goal  of  Improving  some 
measure  of  the  overall  system  performance. 
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CHAPTER  5 

INFORMATION  DESIGN 


5.1.  Introduction 

Id  conventional  single  criterion  optimizations,  the  design  of  an 
optimal  information  structure  generally  reduces  to  either  the  determination 
of  the  most  Informative  structure  which  satisfies  certain  measurement  con¬ 
straints  or  to  precisely  defining  the  tradeoffs  between  the  cost  of  acquiring 
information  and  the  value  this  information  has  in  terms  of  its  effect  on  the 
performance  of  the  control  system  [35],  [36],  and  [37].  As  was  demonstrated 
in  the  previous  chapter,  the  effect  that  the  information  available  to  one 
of  the  controllers  has  on  his  performance  or  on  the  performance  of  the  other 
controllers  is  not  quite  so  self-evident  when  the  decisions  are  being  made 
according  to  the  Nash  strategy.  Also,  the  definition  of  overall  system 
performance  must  be  made  precise  if  it  is  to  be  used  in  the  design  of  a 
"better"  information  structure. 

In  this  chapter  we  will  develop  an  approach  to  the  design  of  the 
information  structure  that  will  provide  improved  performance  for  the  overall 
system.  In  order  to  do  this,  the  design  of  the  information  structure  for  a 
system  in  which  the  controllers  are  choosing  their  controls  according  to  a 
Nash  equilibrium  strategy  must  Incorporate  a  precedence  relationship,  i.e., 
the  Nash  equilibrium  solution  for  the  controls  is  done  for  a  given,  specified 
information  structure.  The  information  design  is  not  itself  a  part  of  the 
Nash  equilibrium  conditions  but  rather  is  done  taking  into  account  the 
subsequent  optimizations  being  performed  according  to  the  Nash  strategy.  In 
this  sense,  the  designer  of  the  information  structure  is  behaving  as  the 


.. 
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leader  would  in  a  Stackelberg  strategy  when  his  control  Is  the  Information 
system.  The  optimization  for  the  Information  structure  must  append  the  sub¬ 
sequent  optimizations  of  the  Individual  DMers. 


5.2.  The  Design  Technique 

The  particular  formulation  that  we  will  consider  is  as  follows. 

A  linear  system  with  m  controllers  acting  on  it  is  represented  by 

m 

x  -  Ax  +  Z  B. u. .  (5.1) 

i-1  i  i 

The  ith  controller  has  measurements 

y±  *  CjX  (5.2) 

and  will  apply  a  linear  output  feedback 

ui  -  m  -FiCix  (5.3) 

in  an  effort  to  minimize  the  cost  function 

t  m 

Ji  *  x  *  l'2  XfKifXf  +  l/’  ^  (x'Qix+ ^Z^ujR^u^dt}  (5.4) 

where  the  expectation  over  xq  is  to  remove  the  dependence  of  the  solution  on 
the  initial  condition. 

The  information  structure  is  determined  by  the  output  matrices,  C^, 

in  (5.2). 

We  will  develop  the  information  design  procedure  for  the  case  of 
linear,  static  output  feedback  control  (5.3).  The  extension  to  the  case  of 
the  controllers  using  dynamic  compensation  of  fixed  order  is  straightforward 
and  conceptually  equivalent  [38]. 


The  Nash  equilibrium  output  feedback  gains 
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are  defined  as  Chose  gains  which  satisfy  Che  inequalities 

ji<F; . . f?  <  ji<f; . Fh-erFu . f:> 

for  all  F^ ,  i»l,2,...,m. 


(5.5) 


For  a  given,  fixed  information  structure,  (5.2),  the  necessary  conditions 
for  the  output  feedback  gains  can  be  determined  by  use  of  the  matrix 
minimum  principle  [32]  and  [33].  We  will  develop  these  first,  in  terms  of  a 
fixed  information  structure,  and  then  develop  the  information  design  stage. 

The  cost  functions  (5.4)  can  equivalently  be  written 


where 


J.  -  S/  tr{Q.X}dt  +  tr{K  fX  } 

1  t  1  r  f 

o 


m 


Qi  -  Qi + 


(5.6) 


and  X  satisfies 


X  -  AX  +  XA* 


X(t  )  -  X  *  E{x  x'} 
o  o  o  o 

*■ 


(5.7) 


The  Hamiltonian  is  formed,  appending  the  matrix  differential  equation  (5.7), 

H1(X,A1,F  ;j-l,...,m)  -  ‘'i  [trt^Xj  +  trU^AX*  XA* )  }] ,  i-l,...,m. 

From  the  matrix  minimum  principle,  the  minimization  of  (5.6)  with  respect 
to  F^  in  accordance  with  the  Nash  strategy  (5.5)  yields  the  following 
necessary  conditions 


A^tf) 


if 


(5.8i) 
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3H 

gjT-  -  0  -  R^F^XC^- B^AfXC^  i  *  1,2, . . .  ,m.  (5.9i) 


If  Che  feedback  gains  were  constrained  to  be  constant  throughout  the 
interval  [tQ,t^)  then  the  condition  *  0  would  be  appended  in  the  Hamiltonian 
which  would  result  in  equation  (5.9i)  being  replaced  by 


f  3H 


t  3Fi 
o 


0, 


i.e. ,  for  fixed  C^, 


0 


3F, 


RiiFiCiPCl  '  BIAiPCI 


(5.9i') 


where  t^ 

P  -  /  X(t)dt. 
t 

o 

For  a  given  information  structure  (5.2),  the  control  gains  must 

satisfy  (5.7),  (5.81),  and  (5.9i)  for  i*l . m.  These  equations  are  a 

two-point  boundary  value  problem  and  must  be  solved  iteratively.  Since 
expressions  for  the  gradients  are  known  (5.9i)  or  (5.9i'),  gradient  dependent 
schemes  for  solving  the  equations  are  applicable.  Convergence  of  any  approach 
cannot  be  assured  a  priori  in  that  the  existence  of  an  equilibrium  solution  is 
itself  not  assured. 

We  now  have  a  characterization  of  the  behavior  of  the  individual 
controllers  in  thier  determination  of  feedback  gains  for  a  given  information 
structure.  The  second  phase  of  the  problem,  the  design  of  the  information 
structure,  can  now  be  developed. 

We  assume  that  the  information  matrices  will  be  chosen  to  minimize 


a  cost  function 


It  is  assumed  that  this  cost  function,  in  some  sense,  represents  an  overall 
system  cost  which  is  to  be  minimized  by  the  choice  of  the  matrices.  This 

might,  for  example,  be  a  Pareto-optimal  cost  function  agreed  on  by  all  of  the 
individual  decision  makers,  i.e.,  if  they  agree  on  the  relative  importance 

of  their  individual  costs  as  expressed  by  the  a^'s  then 

m  m 

J  **  Za.J. ,  a.  >0,  Z  a.  *  1. 

o  i«i  i  i’  i  i»i  i 

Alternately,  the  overall  cost  function  might  be  a  Lyapunov  function  in  which 
case  the  information  structure  is  chosen  to  provide  stabilization  under  the 
subsequent  control  actions. 

The  optimization  for  the  C^'s  is  done  with  each  held  to  a  fixed 
allowable  maximum  rank.  Therefore  as  a  special  case,  one  may  allow  each 
to  attain  a  maximum  rank  equal  to  the  dimension  of  the  system  thereby 
admitting  full  state  feedback  as  an  allowable  information  structure.  Note 
that  full  state  feedback  will  not  necessarily  result  as  being  the  optimum 
structure  since,  as  was  demonstrated  in  the  previous  chapter,  more  informa¬ 
tion  is  not  necessarily  better.  More  realistically,  there  may  be  only  a 
limited  set  of  measurements  available  to  begin  with,  e.g.,  certain  states  may 
not  be  directly  measurable  at  all  or  only  certain  states  are  measurable  by 
certain  controllers,  allowing  for  conditions  such  as  geographic  separation. 
These  conditions  can  be  treated  by  assuming  the  measurements  to  be  in  the 
form 

yi  "  CiDix»  i  •  1, . . . ,m 

where  is  a  fixed  matrix  and,  as  before,  is  the  matrix  to  be  determined 


in  the  optimization. 
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Certain  special  cases  are  of  interest.  In  particular,  if  for 
some  i,  an  initial  guess  for  the  information  structure  is  taken  to  be  C^*I, 

C  -0,  jl*i,  then  decision  maker  i  is  faced  with  a  conventional  full  state 
feedback  optimization  and  the  remaining  controllers  are  constrained  to  take 
no  action.  This  provides  a  convenient  starting  point  for  the  iterative 
calculation  of  the  information  structure. 

The  necessary  conditions  for  the  information  structure  design  are 
presented  in  the  sequel. 

For  the  given  Jq,  the  Hamiltonian  is  formed  to  append  the  necessary 
conditions  which  characterize  the  Nash  equilibrium  solution  for  the  feedback 
gains . 


+ a1'-  tr{p=j  [-va' v  y-  j^i'i  vivwivviw  1  > 


m 


+  2 


1 


+  l'i  tr { rQ [  (A-J EjBj Fj Cj ) Z  +  X(A’ -  E^Cj F^B^ )  ] }  . 

By  the  matrix  minimum  principle  the  necessary  conditions  are  found  to  be 
the  following; 


9F, 


°  *  RojFjCjXCj 

+  Ji(-VjcjVj+BApokc? 
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+  R 


JJ  ofF'r'i  oxcr 
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(5.10) 
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Or,  for  the  optimal  time- invariant  C^'s,  equation  (5.13)  is 
replaced  by 

3 J '  Cf  3H 

<■[  <dt-° 

o 

which  follows  from  appending  the  conditions 


(5.13') 


»  0. 


For  a  given  information  structure,  the  feedback  gains  are  found 
by  solving  (5.7),  (5.8i),  and  (5.9i)  for  i*l,...,m.  This  provides  us  with 
the  F  ,  X,  and  the  A^,  i»l,...,m.  The  equations  (5.11)  and  (5.12)  are  then 
solved  for  Tq  and  PQi  for  1*1,..., m  where  the  algebraic  equations  of  (5.10) 


are  solved  to  eliminate  the  B  .  for  i»l . m.  The  gradient  (5.13)  or  (5.13') 
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can  now  be  evaluated,  new  C^'s  determined,  and  the  process  is  repeated 
until  convergence  is  obtained  or  an  adequate  improvement  in  performance 
is  attained. 

The  amount  of  computation  is  quite  significant  but,  fortunately, 
these  computations  are  done  once  only  and  are  done  by  only  one  decision  maker 
at  the  time  the  information  structure  is  being  chosen.  These  computations 
are  transparent  to  all  of  the  individual  controllers  since  they  merely 
determine  their  feedback  gains  for  the  information  structure  which,  at  that 
time,  is  determined  and  fixed.  Thus,  the  precedence  relationship  of  the 
optimizations  isolates  the  individual  controllers  from  the  major  computa¬ 
tional  task  of  the  information  structure  design.  This  is  a  similar  effect 
to  the  advantage  found  in  the  sampled  data  formulation  of  Chapter  2.  The 
case  in  which  the  available  information  is  restricted,  as  represented  by 


yi '  ciV 

is  conceptually  equivalent  and  the  necessary  conditions  are  developed  along 
identical  lines. 

For  the  case  in  which  the  problem  is  defined  over  an  infinite 
horizon,  the  necessary  conditions  for  the  output  feedback  gains  and  for  the 
information  structure  matrices  reduce  to  a  set  of  algebraic  equations. 

For  this  case,  the  final  necessary  conditions  reduce  to  the 

following. 

For  a  given  information  structure,  the  output  feedback  gains 
must  satisfy 

A'*i  ♦  Aj*  *  ■  0 


(5.14) 


L 


An  iterative  procedure  is  possible  using  the  gradient  information 
supplied  by  (5.20). 

Equation  (5.18)  is  used  to  remove  the  dependence  of  (5.17),  (5.19), 
and  (5.20)  on  the  S^'s 

and  if  is  not  full  rank,  the  equation  is  still  solvable  since 

«(CjLCj)  -  «(Cj) 

for  L>0,  where  ft(.)  denotes  range  space. 


■  tt--v 
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5.3.  An  Example 

For  a  simple  example  to  demonstrate  the  improvement  in  performance 
attainable  through  changes  in  the  information  structure,  consider  the 
following. 

The  system 


x  *  Ax  +  B  u  +  B2u2 


is  second  order  with  u^  and  u2  both  scalar 


-10  0 
0  -10 

-1 


+1 


+1 

-1 


ui  *  'Vi  "  ~fiCix- 


The  information  structure  is  given  by 


clmC2mril  11  • 

4  S2 


The  cost  functions 


Ji  ■  l/jE {/  (x'Qix+ u|Riiu1)dt} 


are  specified  by 


0  0 
0  1000 
1000  0 
o  r 
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For  this  given  nominal  information  structure,  the  behavior  of  the  controller 
is  best  illustrated  by  calculating  their  reaction  curves.  One  to  the 
symmetry  of  the  problem,  the  reaction  curves  are  symmetric  with  respect  to 
one  another  across  the  45°  line.  Figure  5.1  illustrates  the  reaction  curves 
where  f^(f  )  denotes  controller  i's  optimal  feedback  gain  as  a  function  of 
f  ,  j^i,  i“l,2.  The  intersection  of  the  reaction  curves  is  the  point  which 
satisfies  the  Nash  inequalities.  At  the  Nash  equilibrium  point. 


J1  -  J2  -  47 


for  which 


J.  -  *'2  J,  +‘/jJ. 


47. 


'1  '  3  2 

Now  let  us  consider  the  effect  that  a  change  in  the  information  structure  can 
have. 


We  will  consider  variations  in  the  information  structure  para¬ 
meterized  in  terms  of  one  parameter  as  follows: 

Let 

*  [sin(e)  cos(9)] 

C2  ■  [cos(0)  sin(9) ] . 

Notice  that  by  parameterizing  the  information  structure  in  terms  of  6,  we 
are  maintaining 

IICi(9)ll  s  1. 

Variation  of  0  is  a  rotation  of  the  measurement  vectors  in  state  space. 

The  reaction  curves  and  equilibrium  points  for  a  few  values  of  6 
are  shown  in  Figure  5.2. 

In  particular,  the  optimal  value  of  0  is  found  to  be  0  ■  -45°  for 


which  J  -16.3.  The  reaction  curves  for  this  case  are  shown  in  Figure  5.2c). 


FP-«T04 


Figure  5.2b.  Reaction  curves  for  3  »  0. 


Figure  5.2c.  Reaction 
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Figure  5. 2d.  Reaction  curves  for  0  »  -60. 


For  this  information  structure  the  costs  incurred  by  each  controller  and  the 
overall  cost  are  all  reduced  to  approximately  one- third  of  their  values 
for  the  original  nominal  information  structure  which  corresponds  to  9 *+45°. 

5.4.  Conclusions 

The  optimization  for  the  information  structure  must  append  the 
necessary  conditions  which  characterize  the  subsequent  calculations  for  the 
Nash  equilibrium  solution.  The  continuity  of  the  Nash  equilibrium  conditions 
with  respect  to  the  parameters  of  the  information  structure  is  not  insured 
and  requires  further  investigation  if  any  assurances  are  sought  for  a  well 
behaved,  convergent  algorithm.  In  fact,  an  algorithm  for  the  calculation  of 
Nash  equilibrium  output  feedback  gains  for  a  given  information  structure  with 
conditions  which  guarantee  convergence  would  be  significant  in  its  own  right. 
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CONCLUSIONS 

The  decentralized  control  problem  where  the  individual  controllers 
have  different  goals  has  been  considered.  We  have  focused  on  the  role  of  the 
Stackelberg  strategy,  particularly  for  its  application  for  the  coordination 
of  many  controllers. 

Several  issues  related  to  the  applicability  of  the  strategy  have 
been  dealt  with,  resulting  in  tractable,  efficient  algorithms.  The  structure 
of  the  solution  to  a  sampled  data  formulation  has  been  exploited  to  obtain 
particularly  efficient  solution  techniques. 

The  existence  of  solutions  satisfying  the  Stackelberg  strategy 
cannot  in  general  be  assured  a  priori.  Conditions  sufficient  for  guaranteeing 
the  existence  of  a  solution  satisfying  a  Stackelberg  strategy  have  been 
developed.  These  are  merely  sufficient  conditions  and  there  is  need  for 
further  development. 

The  Impact  that  the  information  structure  can  have  on  a  solution 
satisfying  the  Nash  strategy  has  been  illustrated  by  means  of  an  example. 

The  example  serves  as  motivation  for  the  next  section  in  which  an  approach 
to  the  design  of  the  information  structure  has  been  developed  which  exploits 
the  precedence  nature  of  the  Stackelberg  strategy.  The  information  structure 
alone  is  manipulated  in  an  effort  to  coordinate  the  subsequent  actions  of 
the  controllers.  As  with  most  coordination  schemes  using  the  Stackelberg 
strategy,  the  activities  of  the  leader  or  coordinator  are  transparent  to  the 
individual  controllers. 
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